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FOREWORD 

This  report  was  prepared  in  the  Structural  Mechanics  Division,  Field 
Test  and  Evaluation  Branch,  Air  Force  Flight  Dynamics  Laboratory  (AFFDL/ 
FBG),  Wright-Patterson  Air  Force  Base,  Ohio.  The  work  was  done  under 
Project  No.  1472  "Dynamic  Measurement  and  Analysis  Technology  for  Flight 
Vehicles",  Task  147202  "Dynamic  Data  Analysis  for  Flight  Vehicles", 

Work  Unit  14720204  "Statistical  Reduction  of  Dynamic  Data."  The  research 
covered  the  period  November  1973  through  July  1976.  Mr.  Robert  G.  Merkle 
was  the  responsible  engineer  and  author  of  this  report. 


The  probability  density  plots  that  appear  as  Figures  4 through  11 
were  programmed  and  produced  by  Mary  Folz.  Figure  13,  computation  of 
aurocorrelation  values  was  taken  from  reports  prepared  for  the  government 
by  J.  S.  Bendat,  A.  G.  Piersol,  and  L.  D.  Enochson.  Other  figures  were 
produced  by  Mary  Jo  Bornhorst.  Skewness  and  Kurtosis  coefficients  for 
the  Rayleigh  and  Maxwell  Distributions  were  computed  by  H.  L.  Harter. 

The  derivations  in  Appendices  A,  B,  C,  and  D were  pointed  out  by  Robert 
J.  Wherry,  Psychology  Dept,  Ohio  State  University.  Typing  was  done  by 
Timothy  Ketzel  and  Dorothy  C.  Young.  Drafting  of  mathematical  expressions 
was  done  by  James  Sommerville  and  John  Skinner. 
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SECTION  I 
INTRODUCTION 

Measurements  taken  from  systems  responding  to  dynamic  exci- 
tations generally  have  an  unpredictable  random  element  character- 
istic of  one  or  more  of  the  excitation  forces.  Because  of  this 
randomness,  multiple  observations  are  necessary.  Modern  automated 
multichannel  instrumentation  systems  are  capable  of  sensing  and 
recording  an  enormous  volume  of  these  excitation  and  response  meas- 
urements from  numerous  locations  and  test  conditions.  This  report 
summarizes  a number  of  definitions  and  analysis  methods  that  are 
especially  useful  in  the  statistical  treatment  of  such  voluminous 
data. 

1.  STATISTICAL  MEASURES  FOR  SINGLE  VARIABLES 

An  average  value  of  some  kind  is  generally  considered  to  be  the 
most  important  statistic  since  it  is  the  single  value  considered  best 
to  represent  an  entire  set  of  observations.  Six  different  measures 
of  average  value  are  defined:  the  mode,  median,  arithmetic  mean, 

quadratic  mean,  harmonic  mean,  and  the  geometric  mean.  The  choice 
of  which  to  use  depends  on  the  particular  application  and  some  guide- 
lines are  given  for  making  this  selection. 

The  dispersion  of  a set  of  observations  is  generally  the  second 
most  important  statistic  since  it  is  the  single  value  best  repre- 
senting the  degree  of  scatter  in  the  data.  Five  different  measures 
of  dispersion  are  defined:  the  range,  mean  deviation,  standard  de- 

viation, variance,  and  the  coefficient  of  variation.  As  with  the 
average,  the  choice  of  which  to  use  depends  on  the  particular  appli- 
cation and  some  guidelines  for  this  choice  are  given. 
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The  symmetry  and  relative  concentration  about  the  mean  are  two 
other  statistics  important  in  describing  the  distribution  of  a set 
of  observations.  The  coefficients  of  skewness  and  kurtosis  are  de- 
fined to  measure  these  two  attributes.  These  two  statistics  are 
often  useful  in  determining  what  mathematical  probability  density 
functions  are  consistent  with  the  measured  data. 

2.  PROBABILITY  DENSITY  FUNCTIONS  FOR  SINGLE  VARIABLES 

Univariate  probability  density  functions  are  mathematical  func- 
tions of  one  variable  which  plot  as  continuous  single  valued  curves 
lying  on  or  above  the  horizontal  coordinate  axis  with  a finite  area 
between  that  axis  and  the  curve.  The  function  is  multiplied  by  the 
reciprocal  of  this  area  as  a normalizing  constant,  so  that  the  frac- 
tional area  above  any  axis  interval  represents  the  proportion  of  the 
random  variable  observations  having  values  within  that  interval. 

The  Drobability  density  curve  may  be  unbounded  on  both  sides 
extending  from  minus  infinity  to  plus  infinity  along  the  horizontal 
axis;  it  may  be  bounded  on  one  side  as  in  the  case  of  all  positive 
numbers  extending  from  zero  to  plus  infinity;  or  it  may  be  bounded 
on  both. sides  as  in  the  case  of  proportions  extending  from  zero  to 
plus  one.  Using  this  classification  system  important  probability 
density  functions  will  be  defined,  with  the  mean,  standard  deviation, 
skewness,  and  kurtosis  of  each  given  in  terms  of  the  constant  parameters 
appearing  in  the  defining  equation. 
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The  Normal,  Student  t,  and  Cauchy  Probability  Density  Func- 
tions are  the  unbounded  types  defined.  The  normal  is  important  not 
only  as  a parent  population  for  many  measurements,  but  also  as  the 
sampling  distribution  for  the  mean.  Under  fairly  general  conditions, 
the  sampling  distribution  of  mean  values  computed  from  numerous  in- 
dependent samples  approaches  the  normal  distribution  as  the  sample 
size  increases  even  if  the  parent  population  of  the  samDle  is  not 
normal.  This  is  the  reason  for  the  importance  of  the  normal  dis- 
tribution in  statistics.  The  Student  t distribution  is  the  sampling 
distribution  of  the  difference  between  population  and  sample  means 
divided  by  the  ratio  of  the  sample  standard  deviation  to  the  square 
root  of  the  sample  size.  It  approaches  the  normal  as  the  sample  size 
increases.  The  Cauchy  distribution  arises  from  the  quotients  of  two 
independently  distributed  normal  observations  having  zero  means.  If 
both  normal  variates  have  unit  variance  the  resulting  Cauchy  distri- 
bution is  the  same  as  a Student  t for  a sample  size  of  two.  The 
Cauchy  density  is  of  interest  because  it  has  infinite  variance. 
Consequently,  if  two  independent  normally  distributed  observations 
with  zero  means  are  divided  in  the  course  of  data  processing  operations, 
nothing  at  all  is  gained  by  increasing  the  sample  size  of  such  quo- 
tients. 

The  Gamma,  F,  Rayleigh,  Maxwell,  and  Log-normal  Probabi  1 i ty  Den- 
sity Functions  are  the  types  defined  for  all  positive  numbers.  The 
Gamma  is  important  not  only  as  a parent  population  for  many  meas- 
urements but  also  because  of  two  important  special  cases;  the 
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exponential  and  chi-square  distributions.  The  time  intervals  be- 
tween independent  randomly  occurring  events  are  exponentially  dis- 
tributed. The  ratios  of  sample  to  population  variance  for  independent 
normal  samples  have  a chi-square  over  degrees  of  freedom  distribution. 
The  ratios  of  sample  variances  for  pairs  of  independent  normal  samples 
have  an  F distribution.  If  both  rectangular  coordinates  for  obser- 
vations in  a plane  have  normal  distributions,  then  the  radii  in 
polar  coordinates  have  a Rayleigh  distribution.  If  all  three  rec- 
tangular coordinates  for  observations  in  space  have  normal  distri- 
butions, then  the  radii  in  spherical  coordinates  have  a Maxwell  dis- 
tribution. If  the  logarithms  of  measurements  are  normally  distri- 
buted, then  the  measurements  themselves  are  said  to  have  a log-normal 
distribution. 


The  Beta  Probability  Density  Function  is  one  of  prime  importance 
for  variables  bounded  on  both  sides,  that  is  having  finite  upper  and 
lower  limits.  Two  special  cases  are  of  particular  interest.  A con- 
stant probability  throughout  the  range  of  the  variable  forms  a uni- 
form distribution.  The  ordinates  of  a sine  wave  follow  an  arc-sine 
distribution.  If  both  rectangular  coordinates  for  observations  in  a 
plane  have  normal  distributions,  then  the  angles  in  polar  coordinates 


have  a uniform  distribution  and  the  sines  of  those  angles  have  an  arc 
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3. 


SIGNIFICANCE  TESTS  FOR  PAIRS  OF  MEANS  AND  VARIANCES 


The  mean  and  variance  are  defined,  respectively,  as  measures  of 
average  value  and  dispersion  in  Section  II.  The  t and  F probability 
densities  are  given,  respectively,  as  distributions  for  sample  means 
and  sample  variance  ratios  in  Section  III.  In  Section  IV  these 
two  distributions  are  used  in  statistical  tests  for  significant  dif- 
ferences in  the  means  or  variances  of  two  samples. 


If  the  means  of  two  samples  differ  significantly,  then  those 
differences  in  location  or  test  condition  under  which  the  two 
samples  were  recorded  are  variables  affecting  the  measured  values. 
This  is  the  normal  experimental  situation,  the  sample  measurements 
having  been  made  precisely  to  determine  if  such  differences  in  the 
sample  conditions  are  associated  with  differences  in  expected  magni- 
tudes . 


If  the  variances  of  two  samples  differ  significantly,  then  those 
differences  in  location  or  test  condition  under  which  the  two  samples 
were  recorded  are  variables  affecting  the  measured  values.  Flowever, 
this  necessary  conclusion  is  often  missed  if  attention  is  focused 
only  on  average  values.  Significantly  different  variances  about  sim- 
ilar means  imply  greater  dispersion  in  the  one  sample  than  the  other 
and  the  reason  for  this  effect,  is  generally  of  considerable  interest. 

4.  STATISTICAL  MEASURES  OF  INTERDEPENDENCE  AMONG  VARIABLES 

The  coefficient  of  correlation  is  the  statistic  measuring  the 
degree  of  interdependence  between  two  variables.  A perfect  correlation 
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of  plus  one  between  two  variables  x and  y implies  that  they 
may  differ  only  in  the  reference  point  and  scaling  unit  so  that 
y = a + bx  or  y = y + b (x  -x)  with  b > 0.  A perfect  correlation 
of  minus  one  implies  that  y = a -bx  or  y = y -b(x  -x)  with  b > 0. 
Zero  correlation  implies  no  relationship  at  all.  Intermediate  pos- 
itive and  negative  values,  of  course,  imply  imperfect  correlation  in 
which  an  error  term  e varying  randomly  with  each  measurement  appears 
in  the  relationship,  y = a + bx  + e. 

Eight  types  of  bivariate  correlations  are  defined  to  accommo- 
date simultaneous  or  sequential  types  of  quantitative  data  and 
dichotomous  or  multichotomous  types  of  qualitative  classification 
criteria. 

a.  Simple  correlation  is  computed  from  a set  of  paired  meas- 
urements of  two  different  quantitative  variables. 

b.  Auto  correlation  (or  serial  correlation)  is  computed  from 
a single  set  of  sequential  measurements  with  each  value  representing 
an  observation  of  the  first  variable  and  the  corresponding  obser- 
vation of  the  second  variable  given  by  the  value  a fixed  number  of 
steps  later  in  the  sequence.  This  is  useful  in  finding  any  perio- 
dicities in  a sequence  of  measurements. 

c.  Cross-correlation  is  computed  from  a dual  set  of  sequential 
measurements  representing  two  different  quantitative  variables  with 
the  second  variable  of  the  paired  observations  displaced  a fixed 
number  of  steps  later  in  its  sequence  than  the  first.  This  is  useful 
in  excitation-response  relationships. 
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d.  Rank  Correlation  is  computed  from  a set  of  paired  ranks 
obtained  for  each  bivariate  observation  from  their  positions  in  the 
ordered  sequence  of  values  for  each  variable.  This  is  useful  when 
one  or  both  of  the  variables  cannot  be  measured  directly  but  can  be 
ranked  according  to  size. 

e.  Point-Biserial  Correlation  is  computed  from  a set  of  paired 
observations  one  of  which  is  the  value  of  a quantitative  variable  and 
the  other  is  a zero  or  one  value  of  a qualitative  variable  expressing 
the  absence  or  presence  of  a given  attribute. 

f.  Tetrachoric  Correlation  is  computed  from  a set  of  paired 
observations  both  of  which  are  zero  or  one  values  expressing  the 
absence  or  presence  of  different  attributes. 

g.  Coefficient  of  Contingency  and  Correlation  of  Attributes 
are  both  computed  from  contingency  tables  showing  the  number  of  ob- 
servations in  each  cell  of  a matrix  in  which  the  number  of  rows  and 
the  number  of  columns  represent  the  numbers  of  classification  cat- 
egories for  the  two  variables.  In  both  cases  identical  row  distri- 
butions in  all  columns  and  identical  column  distributions  in  all 
rows  imply  zero  correlation.  For  square  matrices  all  observations 
on  the  diagonal  imply  perfect  correlation  and  the  correlation  of 
attributes  is  one.  For  nonsquare  matrices  there  is  no  unique 
diagonal  and  the  coefficient  of  contingency  has  a maximum  value  less 
than  one.  Whatever  the  type,  even  perfect  correlations  cannot  alone 
indicate  a cause-effect  relation.  Variatiors  in  either  may  be  caused 


by  changes  in  the  other,  or  variations  in  both  may  be  caused  by  changes 
in  some  third  variable. 
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Six  types  of  multiple  correlation  are  defined,  the  first  four 
computed  from  matrix  arrays  of  bivariate  correlations  and  the  last 
two  computed  from  ensembles  of  sequenced  observations. 

a.  Multiple  Correlation  measures  the  degree  of  relationship 
between  one  dependent  variable  and  an  entire  set  of  independent 
variables  all  taken  together. 

b.  Marginal  Correlation  measures  the  degree  of  relationship 
between  one  dependent  variable  and  a subset  of  the  independent  var- 
iables with  all  remaining  independent  variables  simply  ignored. 

c.  Conditional  or  Partial  Correlation  measures  the  degree  of 
relationship  between  one  dependent  variable  and  a subset  of  the  in- 
dependent variables  after  statistically  adjusting  for  the  effects  of 
all  the  remaining  independent  variables. 

d.  Canonical  Correlation  measures  the  degrees  of  relationship 
between  a set  of  dependent  variables  and  a set  of  independent  var- 
iables. Marginal  and  conditional  canonical  correlations  could  also 
be  defined  as  above  by  either  ignoring  or  statistically  adjusting  for 
a subset  of  the  independent  variables. 

e.  Auto  correlation  measures  the  degree  of  relationship  be- 
tween measurements  taken  at  any  pair  of  sequence  points  from  an  en- 
semble of  sequenced  data  records.  Varying  the  pair  of  sequenced 
points  leads  to  a matrix  of  correlations. 
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f.  Cross-correlation  measures  the  degree  of  relationship  be- 
tween measurements  taken  at  one  fixed  sequence  point  from  one  en- 
semble and  at  another  fixed  sequence  point  from  another  ensemble. 
Varying  this  pair  of  sequence  points  leads  to  a correlation  matrix. 
The  two  ensembles  are  the  set  of  excitation  or  stimulus  points  and 
the  set  of  response  points. 

5.  FACTOR  ANALYSIS  OF  MULTIPLE  VARIABLES 

Large  numbers  of  observations  of  a single  variable  are  reduced 
to  a few  statistics  describing  average  value,  dispersion,  skewness, 
and  kurtosis.  Large  numbers  of  interrelated  variables  may  also  be 
statistically  reduced  to  a relatively  few  independent  factors  des- 
cribing the  essential  properties  of  a physical,  biological,  or  social 
system.  Two  very  highly  correlated  variables  may  be  assumed  to  be 
measuring  the  same  underlying  factor.  Therefore,  in  the  simplest 
case  of  factor  analysis  all  variables  may  be  divided  into  a few 
groups  such  that  the  correlation  is  very  high  tor  any  two  variables 
from  the  same  group  and  very  low  for  any  two  variables  from  different 
groups.  The  groups  represent  the  factors.  In  more  complex  cases, 
some  of  the  variables  are  composites  of  two  or  more  factors. 

6.  MATHEMATICAL  MODELS  FOR  STATISTICAL  DATA 

Interdependence  among  variables  can  be  used  to  estimate  any  one 

of  them  as  a dependent  variable  in  terms  of  the  others  as  independent 
variables.  For  qualitative  variables  the  analysis  of  variance  model 
is  given  by  a general  term  plus  a positive  or  negative  increment 
associated  with  each  main  category  of  observations  plus  additional 
positive  or  negative  increments  due  to  interaction  effects  that  may 
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arise  from  the  simultaneous  use  of  two  or  more  classification  systems 
to  categorize  the  observations.  For  quantitative  variables  the  re- 
gression model  is  given  by  a constant  term  plus  a series  of  products, 
each  a coefficient  times  an  independent  variable.  By  using  anti- 
logarithms the  analysis  of  variance  model  may  be  transformed  into 
the  product  of  a general  term  times  main  effects  times  interaction 
effects.  The  regression  model  sum  of  products  may  be  transformed 
into  a product  of  powers  in  which  the  coefficients  appear  as  expo- 
nents . 

General  linear  hypothesis  models  incorporate  both  the  analysis 
of  variance  for  qualitative  variables  and  regression  analysis  for 
quantitative  variables.  In  addition  to  estimates  for  regression 
coefficients,  general  terms,  category  effects,  and  interactions,  the 
general  linear  hypothesis  also  provides  statistical  tests  to  determine 
whether  any  one  or  a combination  of  these  estimates  differ  signif- 
icantly from  zero.  This  permits  the  formulation  of  an  optimum  pre- 
diction function  containing  only  those  independent  variables  that 
significantly  affect  the  value  of  the  dependent  variable. 

All  the  definitions  included  in  Section  I for  univariate  statistics, 
probability  densities,  and  measures  of  correlation  could  not  be  found  in 
any  single  source.  However,  any  one  of  them  could  be  found  in  several 
other  sources.  For  this  reason  specific  references  are  not  cited  in  the 
text.  Instead  a bibliography  is  given  listing  some  of  the  more  compre- 
hensive sources  from  which  more  information  may  be  obtained.  Notations 
in  the  bibliography  indicate  the  subject  matter  for  which  each  reference 
is  given. 
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SECTION  II 

STATISTICAL  MEASURES  FOR  SINGLE  VARIABLES 

Measurements  of  a random  variable  require  multiple  observations 
because  of  unpredictable  variations  and  errors  associated  with  each 
measurement.  Such  a set  of  observations  for  any  variable  can  be 
characterized  by  an  overall  average  value  as  well  as  some  measure  of 
the  dispersion  or  scatter  in  the  data.  In  addition,  whenever  such 
a distribution  of  observed  values  is  arranged  in  the  bar  graph  form, 
characteristics  related  to  the  symmetry  and  shape  of  the  distribution 
become  evident.  The  concepts  are  quantified  in  the  next  few  para- 
graphs. 

Let  x.j , i = l...n  denote  a sample  of  n observations  of  the 
variable  x.  If  the  range  of  x is  subdivided  into  a series  of  m 
adjacent  intervals  by  the  ordered  sequence  of  values  x^,  k = 0...m, 
m«n  then  the  function  y|c='F (xk ) can  be  defined  to  mean  the  number  of 
observations  that  fall  in  the  interval  x^  ^<x<x^.  An  example  of  such 
a function  is  plotted  as  a histogram  in  Figure  la.  For  large  numbers 
of  observations  and  more  refined  subdivisions,  such  histograms  approximate 
continuous  mathematical  functions  illustrated  in  Figure  lb.  Here  the  total 
area  under  the  curve  f(x)  represents  the  total  number  of  observations  and 
the  shaded  area  between  the  curve  and  any  fixed  interval  on  the  x axis 
represents  the  number  of  observations  expected  to  fall  in  that  interval. 

If  the  area  under  the  curve  in  Figure  lb  is  normalized  to  one  by  dividing 
f(x)  by  the  total  number  of  observations,  the  shaded  area  then  represents 
the  probability  that  a randomly  selected  observation  x^  will  fall  in  the 
underlying  x interval.  The  probability  density,  p(x),  the  value  of  the 
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Figure  1.  Multiple  Observations  of  One  Random  Variable 
ordinate  in  such  a normalized  curve,  is  defined  as  the  limit  of  the  ratio 
of  the  probability  associated  with  a given  interval  to  the  length  of  the 


interval  as  the  latter  approaches  zero.  Probability  density  functions 
are  therefore  used  to  define  infinite  populations  from  which  any  given 
set  of  measurements  constitutes  only  a small  sample. 
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Statistics  are  numbers  selected  or  computed  from  sample  data  to 
define  various  characteristics  of  the  sample  as  a whole.  Likewise, 
the  mathematical  expression  for  every  probability  density  function 
(pdf)  contains  one  or  more  constant  parameters  that  define  character- 
istics of  the  whole  population.  Sample  statistics  are  therefore  used 
to  estimate  unknown  population  parameters  describing  the  same  general 
characteristic.  The  characteristics  of  chief  interest  include  meas- 
ures of  average  value,  measures  of  dispersion  or  scatter,  and  meas- 
ures of  the  symmetry  and  shape  of  the  histogram  or  probability  den- 
sity function.  In  the  next  three  sections  these  characteristics 
are  defined  and  in  most  cases  computing  formulas  are  given  both  for 
statistics  in  terms  of  summations  of  sample  data  and  for  parameters 

in  terms  of  integrals  of  probability  density  functions. 

12 


» 


LJ ■ 


J 


/ 


AFFDL-TR-76-83 

1.  MEASURES  OF  AVERAGE  VALUES 

Mode  - The  most  frequently  occurring  value  in  a sample  or  pop- 
ulation is  called  the  mode.  In  Figure  lb,  the  mode  is  at  point  a, 
the  value  of  x for  which  f(x)  is  a maximum. 

Median  - The  midvalue  in  the  sequence  of  observations  ordered 
from  lowest  to  highest  is  called  the  median.  In  Figure  lb,  the 
median  is  at  point  b since  a vertical  line  through  b divides  the 
area  under  the  curve  exactly  in  half. 

Arithmetic  Mean  - The  sum  of  all  observations  divided  by  the 
number  of  observations  is  called  the  arithmetic  mean: 

^ 0 OO 

x = £ Xj  /n  fi  = J xp(x)dx  O) 

i = l -oo 


In  Figure  lb,  the  mean  is  at  point  c,  the  x coordinate  of  the  centroid 
(or  balance  point)  of  the  area  enclosed  by  the  curve  and  the  x axis. 


Quadratic  Mean  - The  square  root  of  the  mean  of  the  squares  of 
the  observations  is  called  the  quadratic  mean  or  the  root  mean  square 
(rms) : 
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The  harmonic  mean  is  always  less  than  the  arithmetic  mean  and  would 
be  to  the  left  of  point  c in  Figure  lb. 

Geometric  Mean  - The  anti-logarithm  of  the  mean  of  the  logar- 
ithms of  the  observations  is  called  the  geometric  mean: 

*geom  = exP  [l  ln  *j/n]  Fgeom  = e*PJT  P(x)  ,nx  dx  (4) 

The  Nth  root  of  the  product  of  all  N observations  is  also  the  ge- 
ometric mean  since: 

(4a) 

The  integral  form  does  not  exist  in  this  case  since  an  integral  is 
the  limit  of  a sum  and  there  is  no  corresponding  symbol  for  the  limit 
of  a product.  The  geometric  mean  is  smaller  than  the  arithmetic 
mean  but  larger  than  the  harmonic  mean  for  all  positive  observations. 

Other  Measures  of  Average  Value  - The  quadratic,  harmonic,  and 
geometric  means  were  defined  by  first  taking  functions  of  the  obser- 
vations - the  square,  reciprocal,  and  logarithm  respectively;  then 
computing  the  arithmetic  mean;  and  finally  taking  the  inverse  func- 
tions - the  square  root,  the  reciprocal,  and  the  anti-logarithm  re- 
spectively. This  same  procedure  can  be  employed  by  using  other  func- 
tions to  generate  definitions  for  other  kinds  of  average  values.  The 
median  values  are  unaffected  by  this  process. 


geom 


= exp 


[ X ln  *j/n]  = exp[lnjl(x./nj  = 
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Selecting  Appropriate  Averages  - The  physical  interpretation 
and  application  of  the  data  are  generally  vital  considerations  in 
selecting  the  appropriate  kind  of  average  value  to  use.  The  char- 
acter of  the  data  itself  will  provide  some  guidelines.  For  data 
oscillating  randomly  about  zero,  the  arithmetic  mean  is  not  useful 
since  offsetting  positive  and  negative  fluctuations  will  always  make 
it  nearly  zero  irrespective  of  the  magnitude  of  the  oscillations  - 
in  this  case  the  quadratic  mean  giving  the  root  mean  square  value 
would  be  more  useful.  For  data  containing  zeros  and  negative  values 
the  geometric  mean  is  not  suitable  since  even  a single  zero  valued 
observation  leads  to  a zero  mean  irrespective  of  other  data,  and  an 
odd  number  of  negative  observations  will  lead  to  an  imaginary  mean 
if  the  total  number  of  observations  is  even. 


By  selecting  particular  definitions  of  the  mean  the  resulting 
value  can  be  made  larger  or  smaller  almost  at  will.  Consider  equation 
5 which  defines  the  quadratic  mean  for  m = 2 and  the  harmonic  mean  for 
m = -1 


As  m becomes  increasingly  positive  the  "mean"  defined  approaches  the 
maximum  value  of  x.  As  m becomes  increasingly  negative  the  "mean" 
defined  approaches  the  minimum  value  of  x.  Consequently,  by  this  or 
other  less  evident  methods,  a mean  value  definition  can  be  selected 
to  obtain  a result  almost  anywhere  in  the  range  of  the  data  being 
analyzed. 
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Selecting  the  correct  kind  of  average  to  employ  requires  a 
correct  understanding  of  the  origin,  interpretation,  and  application 
of  the  data  involved.  The  following  example  is  often  cited:  A 

vehicle  travels  120  kilometers  at  40  kilometers  per  hour  and  then 
another  120  kilometers  at  60  kilometers  per  hour.  What  is  the  mean 
speed?  The  arithmetic  mean  (40+60)/2  = 50  is  not  correct.  The 
vehicle  requires  three  hours  for  the  first  segment  and  two  hours 
for  the  second.  Dividing  the  total  240  kilometers  distance  by  the 
total  five  hour  time  gives  an  average  of  48  kilometers  per  hour. 

This  is  the  value  of  the  harmonic  mean:  [ ( 40” 1 + 60- ^ )/2]- 1 = 48. 

If  the  problem  had  been  given  with  equal  times  rather  than  equal  dis- 
tances at  the  two  speeds  then  the  arithmetic  mean  would  have  given 
the  correct  value.  For  other  kinds  of  problems  a similar  reasoning 
process  must  be  employed.  No  general  rule  can  be  given  for  selecting 
the  correct  kind  of  average  to  use  in  every  application. 

2.  MEASURES  OF  DISPERSION 

Range  - The  difference  between  the  largest  and  smallest  values 
in  a set  of  observations  is  called  the  range.  In  Figure  la  the 
sample  range  is  xm  - xq,  in  Figure  lb,  the  population  range  is 
infinite. 

Mean  Deviation  - The  mean  absolute  deviation  of  each  obser- 
vation from  some  average  value  for  all  observations  is  called  the  mearl 
deviation.  The  median  is  the  particular  average  value  customarily 
used  since  the  mean  deviation  is  smaller  about  the  median  value  thar 
about  any  other  number. 
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Standard  Deviation  - The  square  root  of  the  mean  squared  devi- 
ation of  each  observation  from  the  mean  of  all  observations  is  called 
the  standard  deviation  (std.  dev.) 


This  root  mean  square  deviation  is  smaller  about  the  mean  value  than 
about  any  other  number.  Division  by  n-1  rather  than  n in  Equation  6 
compensates  for  this  minimization  bias  that  results  from  the  use  of 
the  same  sample  to  compute  both  mean  and  dispersion.  If  x were  the 
mean  of  the  population,  the  sum  of  squared  deviations  would  be  some- 
what larger  and  the  divisor  would  be  n to  obtain  an  unbiased  estimate 
of  the  population  standard  deviation. 

Variance  - The  mean  square  deviation  about  the  mean  is  called 
the  variance.  It  is  the  square  of  the  standard  deviation. 

Coefficient  of  Variation  - The  ratio  of  the  standard  deviation 
to  the  arithmetic  mean  is  called  the  coefficient  of  variation. 

Clearly  it  is  a relative  measure  expressing  the  dispersion  as  a frac- 
tion of  the  mean  value. 

Other  Dispersion  Values  - As  with  other  mean  values,  definitions 
for  other  measures  of  dispersion  can  be  generated  by  first  taking 
functions  of  observations,  then  computing  the  selected  measures  of 
dispersion,  and  finally  taking  the  inverse  function  of  this  result. 
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Selecting  Appropriate  Dispersions  - As  with  selection  of  average 
values  the  physical  interpretation  and  application  of  the  data  are 
generally  vital  considerations  in  selecting  the  appropriate  measure 
of  dispersion  to  use.  Again  as  with  selection  of  averages,  the  char- 
acter of  the  data  provides  some  guidelines  - for  example,  avoiding 
relative  measures  of  dispersion  for  data  with  negative  or  near  zero 
mean  values.  With  dispersions,  however,  one  additional  factor  from 
sampling  theory  has  a bearing.  The  extreme  values,  i.e.,  the  max- 
imum and  minimum,  from  sets  of  measurements  are  subject  to  a very 
high  degree  of  sampling  variation.  Consequently  the  range  and  other 
measures  of  dispersion  using  these  extreme  values  are  very  unreliable 
as  broad  measures  of  dispersion  for  the  data  as  a whole. 

3.  MEASURES  OF  SKEWNESS  AND  KURTOSIS 

Both  the  mean  and  the  variance  of  a random  variable  are  re- 
lated to  a more  general  set  of  statistics  called  the  moments  of  a 
probability  density  function.  Moments  are  useful  in  specifying  the 
shape  of  a pdf.  The  jth  moment  about  point  a is  defined  as  follows 
for  observed  data  and  probability  density  functions,  respectively: 


i = l 


r 


(x  - o )p(x )dx  j = 1,2,... 


(7) 


The  first  moment  about  zero  is  simply  the  arithmetic  mean.  The  sec- 
ond moment  about  the  mean  is  the  variance.  Third  and  fourth  moments 
about  the  mean  are  used  in  computing  coefficients  of  skewness  and 
kurtosis  which  are  associated  with  the  symmetry  and  peakedness  of  a 
probability  density  function. 
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a.  Standardized  Data 

In  order  to  specify  the  higher  moments  in  a more  useful 
form,  the  effect  of  uniform  changes  in  the  observations  on  their 
mean  value  and  standard  deviation  should  be  noted.  Adding  or  sub- 
tracting a constant  to  each  observation  will  add  or  subtract  the  same 
amount  to  the  mean  value  but  leave  the  standard  deviation  unchanged. 

On  the  other  hand  multiplying  or  dividing  each  observation  by  a con- 
stant value  will  multiply  or  divide  both  the  mean  value  and  the  stan- 
dard deviation  by  the  same  constant.  In  either  kind  of  uniform  change 
the  relative  magnitudes  of  the  deviations  from  the  mean  and  consequently 
the  shape  of  the  histogram  or  frequency  distribution  function  re- 
mains unchanged. 


A special  case  of  uniform  changes  is  of  interest:  first 

subtracting  the  mean  value  from  each  observation  and  then  dividing 
the  result  by  the  standard  deviation  to  form  a new  set  of  stan- 
dardized data  having  a mean  value  of  zero  and  a standard  deviation 
of  one.  Standardized  data  are  quite  useful  in  studying  character- 
istics of  statistical  data  related  to  the  shape  of  the  probability 
density  function  and  in  measuring  the  correlation  between  two  vari- 
ables . 


i 
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b.  Moments  of  Standardized  Data 

The  first  four  moments  of  the  standardized  value  of  x for 
both  summations  of  sample  data  and  integrals  of  probability  density 
function  are  as  follows: 

First  Moment 


J ^ ) p(x)dx  = -Jp-  j (x -/x)p(x)dx  = 0 


Second  Moment 


x y (xL-i)2  = _l  £ (Xi  '."l  = i 

n 3 / s2.^  n 


f P(x)d*  = ~~zj  U-/i)zp(*)<l*  = I 


Third  Moment 


J P<*>dx  8 ~y/  (*-/i)3P(*,d*  = a: 


(8) 


Fourth  Moment 


_ oo  y 1 4 « oo 

/ [*(?-—)  P<*>dx  = / (x-^.)4p(x)dx  = /32  = a4 

CT  — oo 


(9) 
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Skewness  - The  standardized  third  moment,  o^,  called  the  co- 
efficient of  skewness,  is  necessarily  zero  for  symmetric  probability 
density  functions  in  which  p(x)  =p(-x),  (Sufficient  conditions  for 
symmetry  require  that  all  odd  moments  equal  zero).  For  probability 
density  functions  skewed  to  the  right  as  in  Figure  lb,  > 0,  for 
those  skewed  to  the  left  < 0. 

Kurtosis  - The  standardized  fourth  moment,  a^,  called  the 
kurtosis,  is  associated  with  the  varying  degree  of  concentration 
about  the  mean  that  is  possible  for  probability  density  functions 
having  the  same  mean  and  standard  deviation  as  shown  in  Figure  2. 


Figure  2.  Kurtosis  Variations  in  Probability  Density  Functions 

Values  of  the  skewness  and  the  kurtosis  for  several 
probability  densities  of  theoretical  and  practical  interest  are  given 
in  the  next  section. 
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SECTION  III 

PROBABILITY  DENSITY  FUNCTIONS  FOR  SINGLE  VARIABLES 

Square,  sinusoidal,  triangular,  and  random  waves  commonly  used 
in  instrumentation  laboratories  are  each  associated  with  probability 
density  functions  of  theoretical  interest  as  shown  in  Figure  3.  The 
square  wave  amplitude  lying  alternately  at  the  positive  and  negative 
extremes  is  represented  by  a discrete  probability  density  function 
with  delta  functions  at  those  extremes  and  zero  elsewhere.  The  sine 
wave  amplitude  crossing  the  t axis  at  a steep  angle  and  then  flat- 
tening out  at  the  extremes  is  represented  by  the  U-shaped  arc-sine 
distribution  with  a minimum  at  zero  amplitude.  The  triangular  wave, 
uniformly  crossing  all  amplitudes  within  the  wave  range,  is  represented 
by  the  flat  uniform  distribution.  The  random  wave  amplitude,  lying 
predominately  near  the  t axis,  is  represented  by  the  bell  shaped 
normal  distribution.  For  the  random  wave,  the  interval  between  a 
reference  point  and  the  nearest  axis  crossing,  negative  to  the  left 
and  positive  to  the  right  of  the  reference  point,  is  represented  by 
the  double  exponential  distribution. 

In  each  case  shown  in  Figure  3 the  mean  value  p is  set  equal  to 
zero  by  so  locating  the  t and  y axes;  the  root  mean  square  value  (or 
standard  deviation)  is  equal  to  a by  choosing  the  appropriate  wave 
amplitude,  and  the  coefficient  of  skewness  is  zero  due  to  the 
symmetry  with  respect  to  positive  and  negative  values.  Therefore, 
illustrations  of  the  kurtosis  values  indicated  in  Figure  3 are 
those  associated  with  the  corresponding  pdf  shapes.  Each  of  those 
probability  density  functions  may  be  given  a nonzero  mean,  ii/O,  by 
replacing  x with  x - p in  its  mathematical  expression.  This  cor- 
responds to  vertical  shifts  in  the  waveform  and  horizontal  shifts  in 
the  density  curve. 
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1.  DISTRIBUTIONS  UNBOUNDED  ON  BOTH  SIDES 
a.  Normal  Distribution 

The  normal  or  Gaussian  probability  density  function,  the 
most  important  one  for  unbounded  random  variables,  is  given  by  Equa- 
tion 10  and  shown  in  Figure  4: 


P(x)  = 


(T-JZir 


(10) 


mean  = p st.  dev  = a 

median  = p skewness  = 0 
mode  = p kurtosis  = 3 

In  Figure  4,  variations  in  the  location  parameter  p will  shift  the 
curve  to  the  left  or  right,  while  variations  in  the  scaling  parameter 
a will  expand  or  contract  the  x axis.  Inflection  points  occur  at 
x = p + a.  Given  n independently  distributed  normal  variables  with 
parameters  (p-|,o^) (pn,on),  their  sum  is  also  nor- 

mally  distributed  with  parameters  (p^+. . .+pn,yb^  +...+on  ).  The  dif- 
ference between  two  such  variables  is  again  normally  distributed 

/ 2 "7” 

with  parameters  (p-j^.y^  +o  ). 


The  normal  distribution  can  be  derived  as  the  model  for 
processes  in  which  total  measurement  error  results  from  the  sum- 
mation of  many  small  errors.  It  can  therefore  be  used  to  express  the 
probability  that  the  value  of  such  a stochastic  variable  will  lie 
within  a given  interval.  For  example,  in  the  acoustic  or  electronic 
noise  signal  represented  by  the  randum  wave  of  Figure  3,  the  prob- 
ability that  the  signal  lies  within  one  standard  deviation  of  the 
mean  is  estimated  by  the  ratio  of  the  time  that  the  wave  amplitude 
x(t)  lies  in  the  interval  -o<x(t)<o  to  the  total  time  of  measurement. 
The  same  probability  is  also  represented  by  the  area  under  the  ad- 
jacent normal  pdf  curve  between  the  lines  x = -o  and  x = +a. 
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b.  Sampling  Distribution  of  the  Mean 

The  normal  pdf  is  not  only  the  basic  distribution  law  for 
many  stochastic  processes  but  also  plays  a vital  role  in  statistics. 
If  samples  of  size  n are  repeatedly  drawn  from  a normal  population 
and  mean  values  are  computed,  these  means  themselves  constitute  a 
new  normal  distribution  with  a mean  value  equal  to  the  mean  of  the 
parent  population  and  a standard  deviation  equal  to  the  standard 
deviation  of  the  parent  population  divided  by  the  square  root  of  the 
sample  size.  The  sampling  distribution  of  the  mean  for  a normal  pop- 
ulation with  mean  p and  standard  deviation  a is  therefore  given  by 
Equation  10a: 


p(x)  = 


(oA/fi)./2  7T 


mean  = p 
median  = p 
mode  = p 


std.  error  = a/*/"rT 
skewness  = 0 
kurtosis  = 3 


Consequently,  a mean  value  computed  from  one  sample  is,  in  reality, 
a single  observation  from  this  sampling  distribution  of  the  mean  with 
o/A7  called  the  standard  error,  as  its  measure  of  variability. 
Accordingly,  10a  can  be  employed:  (1)  to  obtain  confidence  intervals 

for  sample  means,  or  (2)  to  determine  if  a sample  mean  represents 
a normal  universe  with  a known  mean  and  variance,  or  (3)  to  determine 
if  two  sample  means  represent  the  same  normal  universe  with  known 
variance  or  (4)  to  determine  equivalence  of  means  for  two  samples 
from  different  universes  with  known  variances.  Even  when  the  parent 
population  is  not  normally  distributed,  so  long  as  it  has  a finite 
variance,  the  distribution  of  sample  means  still  approaches  the  normal 
distribution  as  the  sample  size  increases.  It  is  this  property  that 
gives  the  normal  probability  density  function  its  position  of  impor- 
tance in  statistics. 
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c.  Student  t Distribution 

Both  the  mean  and  the  standard  deviation  of  a normal 
population  must  be  known  to  obtain  the  distribution  of  sample  means 
using  Equation  10a.  If  the  mean  is  given  but  the  standard  devi- 
ation must  be  estimated  from  the  sample,  a simple  substitution  of 
s for  a is  not  sufficient.  This  is  because  (x-p)/(o/^nT  is  normally 
distributed  with  zero  mean  and  unit  variance  but  the  exact  distri- 
bution of  (x-p)/(s/VnT  depends  on  the  sample  size  n.  The  Student  t 
distribution,  given  by  Equation  11  and  shown  in  Figure  5 is  the 
sampling  distribution  of 


s 

1 
, J 


♦ = (x-u)/(*A/n  ) 1 


P(t)  = 


n?+f)  t», -(»+! )/2 

nfcnw*  * 


v = n- 1 


(ID 


mean  = 0 
median  = 0 
mode  = 0 


std.  dev.  /(v-2) 

skewness  = 0 
kurtosis  = 3 + 6/ ( v -4 ) 


Inflection  points  occur  at  + ^v/(v+2 


v > 2 
v > 3 
v > 4 


The  t distribution  approaches  the  unit  normal  as  v-*».  The  def- 
inition of  t and  Equation  11  can  be  used  (1)  to  obtain  confidence 
intervals  for  population  means,  or  (2)  to  determine  if  a sample  mean 
represents  a normal  universe  with  known  mean  but  unknown  variance, 
or  (3)  to  determine  if  two  sample  means  represent  the  same  normal 
universe  with  an  unknown  variance,  or  (4)  to  determine  equivalence 
of  means  for  two  samples  from  different  normal  universes  with  un- 
known variances.  Application  of  the  Student  t distribution  for  such 
hypothesis  tests  will  be  treated  in  Section  IV,  Significance  Tests 
for  Pairs  of  Means  and  Variances. 
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d.  Cauchy  Distribution 

A distribution  of  theoretical  interest  is  obtained  from 
the  Student  t distribution  by  taking  v = l and  setting  t = (x-y)/a 
with  y and  a serving  respectively  as  location  and  scaling  parameters 
(but  not  as  mean  and  standard  deviation  since  all  moments  are  infinite). 
The  Cauchy  distribution,  given  by  Equation  12  and  shown  in  Figure  5 
for  v = 1 has  no  finite  mean  or  standard  deviation: 


PU)  3 -7—  S =- 

l+[(x-/Ll)/0-]2 

rx 

J p(x)dx  = ^ orcton  (-jjr) 


(12) 


median  = p mode  = y 

Inflection  points  occur  at  y + o/j2~ 

The  ratio  of  observations  from  two  independent  normal  distributions 
having  zero  means  and  standard  deviations  and  is  Cauchy  dis- 
tributed with  y=0  and  Since  this  ratio  can  be  inverted, 

the  reciprocal  of  a Cauchy  variate  is  also  Cauchy  distributed.  The 
sum  of  independent  Cauchy  variates  is  also  Cauchy  distributed.  Con- 
sequently, the  arithmetic  mean  has  the  same  Cauchy  distribution  as 
the  individual  observations.  Previously,  it  was  noted  that  the  dis- 
tribution of  the  arithmetic  mean  approaches  the  normal  as  sample 
size  increases,  no  matter  how  the  parent  population  is  distributed 
so  long  as  it  has  finite  variance.  The  Cauchy  distribution  does  not 
meet  the  last  condition.  Because  of  its  infinite  variance,  the  mean 
of  a Cauchy  pdf  is  no  more  informative  than  a single  observation. 


29 


AFFDL-TR-76-83 


2.  DISTRIBUTIONS  BOUNDED  ON  ONE  SIDE 
a.  Gamma  Distribution 

The  gamma  probability  density  function,  an  important  one 
for  random  variables  bounded  on  one  side,  is  given  by  Equation  13 
and  shown  in  Figure  6: 


P(R,  = ^,(^)a  ^ a>0l/3>0 ,x>y  (13) 

mean  value  = y + aB  skewness  = 2/Ja“ 

std.  dev.  = B^a”  kurtosis  = 3 + 6/a 

Gamma  function  T(a)=J^  t a-let dt ; I'll)  = I 

recurrence  formula  T(a+1)  = af(a) 

for  a an  integer  T(a)  = (a-1)  (a-2)(a-3) (1)  = (a-1)! 

In  Figure  6,  variations  in  the  location  parameter  y will  shift  the 
curves  to  the  left  or  right,  while  variations  in  the  scaling  para- 
meter B will  expand  or  contract  the  x axis.  For  a > 1,  the  curves 
have  a mode  at  x = y + B(a-l)  = mean  - B.  Inflection  points  equidistant 

from  the  mode  occur  at  x = y + B[(a-1)  + V (a-1 ) = (mean  - B)  + B Va-  1 . 
As  the  shape  parameter  a increases,  the  skewness  and  kurtosis  approach 
0 and  3 respectively  and  the  curve  shape  becomes  more  like  the  normal. 

If  x^,  ...»  x are  independent  random  variables  having  gamma  dis- 
tributions with  common  values  of  B and  y,  then  their  sum 

(x^  + +xn)  also  has  a gamma  distribution  with  the  same 

values  of  B and  y and  with  a = a-|  + . . .+an> 

The  gamma  distribution  is  the  appropriate  model  for  the 
time  required  for  a total  of  a independent  events  to  occur  if  the 
mean  time  per  event  6 remains  constant.  An  example  would  be  a axis 
crossings  of  the  random  wave  in  Figure  3.  The  time  for  one  event  or 
equivalently  the  time  between  events  is  therefore  a special  case  called 
the  exponential  distribution. 
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b.  Exponential  Distribution 

The  times  between  independent  randomly  occurring  events 
are  distributed  according  to  the  exponential  distribution.  An  ex- 
ample would  be  the  times  between  axis  crossings  of  the  random  wave 
in  Figure  3.  The  exponential  distribution,  given  by  Equation  13a  and 
shown  in  Figure  6,  is  a gamma  pdf  with  a = 1: 


mean  value  = >+3 
std.  dev  = 3 


P > 0,x  > y 
skewness  = 2 
kurtosis  = 9 


(13a) 


Since  thetimes  between  events  preceding  and  succeeding  a reference 
point  are  measured  in  opposite  directions  they  may  be  given  opposite 
signs.  The  resulting  symmetric  double  exponential  or  Laplace  prob- 
ability density  function  is  shown  in  Figure  3. 
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c.  Chi-Square  Distribution 

If  random  variable  x is  normally  distributed  with  mean  y 

2 

and  standard  deviation  o,  then  chi-square  (x  ) with  n degrees  of 
freedom  is  defined  as  the  sum  of  squares  of  n standardized  normal 
observations: 


x*-l  [(■.-/*)/']' 


The  chi-square  distribution,  given  by  Equation  13b,  is  a gamma  pdf 
with  a=n/2 , 6=2,  and  y=0: 


mean  value  = n skewness  = 2/ •/n/2 

std.  dev.  kurtosis  = 3 + 12/n 

For  the  case  n=l,  this  is  the  distribution  of  the  squares  of  stan- 
dardized normal  data  which  has  mean=l , st.  dev.  -»/2  , skewness 
2jT , and  kurtosis  = 15.  The  chi-square  distribution  plays  a vital 
role  in  statistics  because  simple  transformations  of  it  yield  impor- 
tant sampling  distribution  as  noted  in  the  next  four  paragraphs 
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d.  Sampling  Distribution  of  the  Variance  and  Variance  Ratio 

9 

Multiplying  chi-square  by  a / n gives  the  variance  for  a 
sample  of  size  n from  a normal  universe  with  mean  p and  standard 
deviation  a: 


1 

n 


2 

The  sampling  distribution  of  the  variance  s given  by  Equation  13c 

2 

is  a gamma  pdf  with  a = n/2,  8 = 2o  /n,  and  y = 0: 


p(,2)=  (2<r2/n)'r(-r) 


s2  > 0 (13c) 


mean  value  = a2  skewness  = 2/*/r\/2 

std.  dev.  = o2/^n  kurtosis  = 3 + 12/n 


Variances  computed  from  independent  samples  of  the  same  normal  pop- 
ulation constitute  a gamma  pdf  having  a mean  equal  to  the  population 
variance  and  a standard  deviation  equal  to  the  population  variance 
divided  by  the  square  root  of  half  the  sample  size. 

2 2 

The  sample  to  population  variance  ratio  s /o  , given  by 

2 

X /n,  is  frequently  more  convenient  to  use: 


\ n o t 

- «4- 1 (»,->*)/. 

/ i = i 


a - 
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2/  2 

The  sampling  distribution  of  the  variance  ratio  s 'a,  given  b.y 
Equation  13d,  is  a gamma  pdf  with  cx=n/2,  8=2/n,  and  y=0: 


(z/n) r (nA) 


s y<T2  >0 


mean  value  = 1 
std.  dev. 


skewness  = 2/ -/r\/2 
kurtosis  = 3 * 12/n 


(13d) 


This  chi-square  over  degrees  of  freedom  probability  density  function 
is  commonly  employed  in  statistical  theory  to  obtain  confidence  in- 
tervals for  variances,  or  to  determine  if  a sample  variance  is  consistent 
with  some  fixed  population  value  suggested  by  theory. 


j 

i 
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e.  F Distribution 

The  distribution  of  the  ratio  of  two  variates  having  in- 
dependent chi-square  over  degrees  of  freedom  probability  densities 
is  also  the  distribution  of  the  ratio  F of  sample  variances  from  two 
normal  universes: 


I 2 _I_  £ frit  - y 
V'  *i  " V !i\  °)  ) ° * i? 


hr-i)7(v')^! 


X,2/(n,-i)  _ sf/a,2  s?/«i 

xl/(n2-')  si/°2  C7I//ct2 


r2  S2 

J2  5I 

_2  2 


The  F distribution  given  by  Equation  14  and  shown  in  Figure  7 is  the 

2 2 

sampling  distribution  of  the  ratio  of  two  sample  variances,  F = s -| / s 2 
having  v-j  and  v2  degrees  of  freedom 


p ( F)  = v?\ 


/2*V! 


r&»4) 

r(4-)r(4)  (•>** 


mean  value  = v. 


Vte'2) 

^2  J2  (y \*VZ~Z) 

(*2  -2)  %/*/l(*'2_4) 


(2*'.  + *'2-2)  / B(^-> 


l2[(k2~2f(><Z~4)  * h(h't'|/2~2)(5t'2~22] 

e)(1'.  +*,2  ~2) 


hurtosit  =3  4- 


Mean  value,  std.  dev.,  skewness  and  kurtosis  formulas  hold  only 
for  v2  > 2,  v2  > 4,  v2  > 6 and  v2  > 8 respectively. 
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The  F distribution  is  used  in  testing  the  hypothesis  of  equality  be- 
2 2 

tween  variances  o-j  and  a Specifically  would  be  rejected 

if  either 

(5i  /*2)  < Fn,  - l,n2  - I,  a,  or  (sf/sf  ) > Fn,  - 1 , n2  - I,  I - a2 

where  the  level  of  significance  is  a-j  + < 1-  Application  of 

the  F distribution  for  such  hypothesis  tests  will  be  treated  in 

Section  IV,  Significance  Tests  for  Pairs  of  Means  and  Variances. 

2 

If  v-j  = 1 and  F is  set  equal  to  t , the  F distribution,  Equation  14, 
reduces  to  the  Student  t distribution,  Equation  11,  given  previously 
for  testing  the  hypothesis  of  equality  between  means. 
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f.  Rayleigh  and  Maxwell  Distributions 

2 

Multiplying  chi-square  by  a gives  the  error  sum  of  squares 
for  a sample  of  size  n from  a normal  population  with  mean  p and  stan- 
dard deviation  o: 


fl  i \ 2 n 

V=  °-2  z (“V1)  =i?,Ui*^) 


The  Rayleigh  distribution,  given  by  Equation  15  and  shown  in  Figure  8, 
is  the  probability  density  of  - x for  n = 2 


p(x)  = -V-  t*2/Z<T*  x 2 0 
a 


(15) 


mean  value  = oV 7r/2  = 1.25a  skewness  = (7r-3)/(2-7r/2)s/i=  0.2709 

std.  dev.  = a./FiFTS' = 0.66a  kurtosis  = (8-37r%)/(2-7r/2)2=  3.2451 

The  Maxwell  distribution,  given  by  Equation  16  and  shown  in  Figure  9, 
is  the  probability  density  of  = x for  n = 3 


(16) 


mean  value  = ZaJzTir  = 1.60a  skewness  = v/27iF  (32/7r-IO)/(3-8/7r)5/2=  0.4857 
std.  dev.  = ay3-8/7T  = 0 67a  kurtosis  = <l5+l6/7r-l92/7rz)/(3-8/7r)2  = 3 1082 


If  the  errors  in  the  coordinates  of  a rectangular  system  are  in- 
dependent and  normally  distributed  with  the  same  variance,  then  the 
distribution  of  radial  error  is  Rayleigh  for  a plane  and  Maxwell  for 
a volume.  Similarly,  if  the  rectangular  components  of  a particle 
velocity  are  independent  and  normally  distributed  with  the  same 
variance,  then  the  distribution  of  particle  speed  is  Rayleigh  for 
motion  in  a plane  and  Maxwell  for  motion  in  a volume. 
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g.  Lognormal  Distribution 

For  some  stochastic  processes,  the  normal  probability 
density  function  applies  to  the  logarithms  of  measured  data  rather 
than  to  the  original  measurements  which  must  then  conform  to  the 
lognormal  probability  density  function.  The  lognormal  distribution 
given  by  Equation  17  and  shown  in  Figure  10  is  the  pdf  for  the 
antilogs  of  normally  distributed  data. 


„>0 

xov  2tt 


mean  = 

e fx.+<rz/2 

std.  dev.  = 

median 

II 

"F 

skewness  = 

(e0"  +2) 

mode  = 

kurtosis  = 

S-Me^-  l)(e3<r  + 3e 

20- 


(17) 


+6ec 


2 

+ 6) 


Coefficient  of  variation  = std.  dev. /mean  = 

Inflection  points  occur  at  exp  (/x--fo-2t  oVi+o^M  ) 

Note  that  y now  serves  as  a scaling  parameter  and  a as  a shape  para- 
meter. Given  n independently  distributed  lognormal  variables  with 

parameters  (yii  ,o-| ) (yn,on)  their  product  is  also  lognormally 

/ 2 p 

distributed  with  parameters  (y^  + . . . .+yn + +a  ).  The  quo- 

tient of  two  such  variables  is  again  lognormally  distributed  with 

Vfo  2 

o-j  + a 2) . 

The  lognormal  distribution  can  be  derived  as  the  model  for 
processes  in  which  total  measurement  error  results  from  the  multi- 
plication of  many  small  errors.  The  lognormal  model  for  products 
is  thus  analogous  to  the  normal  model  for  sums.  In  particular,  if 
samples  of  size  n are  repeatedly  drawn  from  a lognormal  population 
and  geometric  means  are  computed,  these  geometric  means  themselves 
constitute  a new  lognormal  distribution  with  parameters  (y,  a/J nj. 
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Figure  10.  Lognormal  Probability  Density  Function 
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3.  DISTRIBUTIONS  BOUNDED  ON  BOTH  SIDES 


a.  Beta  Distribution 

The  beta  probability  density  function,  an  important  one 
for  random  variables  bounded  on  both  sides,  is  given  by  Equation  18 
and  shown  in  Figure  11 : 


- rip+q)  (y-o)P  *(b-y)q  1 
" HplRq)  (b-a)P+qrl 


o£y<  b 

P>0  (18) 

q>0 


The  transformation  x = (y-a)/(b-a)  changes  Equation  18  to  the  stan- 
dard form  (Equation  18a)  bounded  by  zero  and  one 


p(x)  = 


r*(p+q) 

nplHq) 


xP-'d-x)^' 


0<  x < I 

p>0  (18a) 

q >0 


mean  value  = p/(p+q) 

std.  dev.  = -/pq/(p+q+l)/(p+q) 


skewness 

kurtosis 


2 (q-p)  -/p+q  + 1 /(p  + q+2)  ,/pq" 

3(p  + q + l )[2(p  + q)2+pq(p+q-6)l 
pq(p+q  + 2)(p+q+3) 


For  the  beta  probability  density  function  (Equation  18a): 

(1)  When  p > 1 and  q > 1,  a single  peak  occurs  at 
x = (p-1 )/ (p+q-2)  • 


(2)  When  p < 1 and  q < 1 , a single  valley  occurs  at 
x = (p-1 )/p+q-2). 

(3)  When  p > 1 and  q < 1,  the  distribution  is  J shaped. 

(4)  When  p < 1 and  q 1 , the  distribution  is  reverse 

J shaped. 

(5)  When  p = q,  the  distribution  is  symmetrical. 
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Figure  11.  Beta  Probability  Density  Function 
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(6)  For  all  positive  values  of  p and  q,  there  are  points 
of  inflection  at 

P-'  + 1 /(p-IXq-ir 

p + q-2  p+q-2  ^ P + q-3 

provided  these  values  are  real  and  lie  between  0 and  1. 

(7)  If  both  p and  q are  increased  while  maintaining  the 
ratio  p = p/(p+q)  constant,  the  variance  decreases  and  the  stan- 
dardized distribution  tends  toward  the  normal  distribution. 

(8)  The  more  general  form  (Equation  18)  has  a mode  at 
x = a + (b-a ) (p+q )/ (p+q-2 ) . 

The  beta  distribution  is  the  appropriate  model  for  the  distribution 

of  the  proportion  of  a population  lying  between  lowest  and  highest 

values  in  a sample.  A special  case  of  the  beta  distribution  arises 

naturally  as  the  distribution  of  = X^/tX^+X^)  where  X^,  X^  are 

independent  random  variables  distributed  as  chi-square  with  v,^ 

2 

degrees  of  freedom  respectively.  V is  then  distributed  as  a stan- 
dard beta  (Equation  18a)  with  p = v^/2  and  q = 
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b.  Uniform  Distribution 

The  triangular  wave  with  a random  phase  of  starting  point 
has  a uniform  probability  density  function  as  shown  in  Figure  3. 

The  uniform  distribution,  given  by  Equation  18b  and  shown  as  the 
horizontal  line  in  Figure  11,  is  a beta  pdf  with  p=l  and  q=l : 

p(x)=l/(b-o)  o<x£b  (18b) 

mean  value  = (a  + b)/2  skewness  = 0 

std.  dev.  = (b-a)/2>/3"  kurtosis  = 1.8 

Solving  for  a and  b in  terms  of  the  mean  p and  standard  deviation 
a and  substituting  into  p(x)  gives  the  uniform  pdf  in  the  form  shown 
in  Figure  3. 

c.  Arc-Sine  Distribution 

The  sine  wave  with  a random  phase  or  starting  point  has 
a U-shaped  arc-sine  probability  density  function  as  shown  in  Figure  3. 
The  arc-sine  distribution,  given  by  Equation  18c  and  shown  as  the  U- 
shaped  curve  in  Figure  11  is  a beta  pdf  with  p=l/2,  q=l/2,  and  -a<x<a 

p(x)  = I/7T  -/a*-xz  -o<x<o  (18c) 

mean  value  = 0 skewness  = 0 

std.  dev.  = a/ »J2  kurtosis  = 1.5 

Setting  a -oSl  in  p(x)  gives  the  arc-sine  pdf  in  the  form  shown 
in  Figure  3.  The  name  arc-sine  comes  from  the  cumulative  form  of 
this  pdf  given  by  Equation  18d 

J*p(x)dx=jf  (tt*/qz-xz  ) ' dx  =ir‘’arc$in  -j-  + -y  (1 8d) 
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SECTION  IV 

SIGNIFICANCE  TESTS  FOR  PAIRS  OF  MEANS  AND  VARIANCES 


Statistical  tests  of  significance  are  used  to  determine  whether 
the  same  statistical  quantities  computed  from  two  different  samples 
differ  by  more  than  would  be  expected  from  sampling  variations  alone 
If  they  do,  the  conditions  under  which  the  two  samples  were  obtained 
have  produced  significant  effects  which  must  be  accounted  for  in 
subsequent  analysis.  The  two  most  common  statistical  tests  are  the 
F test  and  the  t test  used,  respectively,  to  test  for  equality  of 
variances  and  equality  of  means  computed  from  two  different  samples. 
1.  TEST  FOR  EQUALITY  OF  VARIANCES 

The  F statistic  is  given  by  the  formula 


F = 


where 


s2 


larger  of  two  standard  deviation 
values  (19) 
smaller  of  two  standard  deviation 
values 


Significant  differences  in  variance  exist  if  this  computed  F exceeds 
the  tabulated  F that  will  be  found  in  the  row  and  column  headed  re- 
spectively by  one  less  than  the  denominator  sample  size  and  one  less 
than  the  numerator  sample  size.  Such  standard  F tables  will  be 
found  in  almost  any  statistics  text  for  several  choices  of  the  de- 
sired level  of  significance,  i.e.,  the  probability  of  rejecting  the 
equality  of  variance  hypothesis  when  it  is,  in  fact,  true. 
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I 
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2.  TEST  FOR  EQUALITY  OF  MEANS 

The  t test  for  equality  of  means  for  the  two  samples  is  of  two 

forms  depending  on  whether  the  sample  variances  are  equal  or  not. 

2 2 

If  the  sample  variances  are  equal  (a, =02) : 


2 7 2 _ (£*2i)2 

1 +L  *2i  ~ n2 


t = 


xrx, 

where  Sp 


7 2 .7  2 

L *h  n.  x 


nl  +n2“2 


(20) 


2,2, 


If  the  sample  variances  are  not  equal 


t = 


x.-x. 


•/s^ /nl+s\/n2 


(il  + il)2 

V n.  n~ ) 

with  df  = — > i 2 

i-2  iz 


(sf/n,)2  ^ (s|/n2) 


7~2 


n,+  I 


n2+  I 


where  x,-,  x = sample  values 
n^  , nj  = sample  sizes 


(21) 


Xj , = sample  means 

s 1 , S2  = sample  standard 
deviations 


df  = degrees  of  freedom 


For  the  equal  variance  case,  significant  differences  in  means  exist 
if  the  computed  t exceeds  in  absolute  value  the  tabulated  t for 
n-j  + n2  - 2 degrees  of  freedom.  For  the  unequal  variance  case,  sig- 
nificant differences  in  means  exist  if  the  computed  t exceeds  in 
absolute  value  the  tabulated  t for  the  degrees  of  freedom  given  by 
the  expression  for  df.  Standard  t tables  will  be  found  in  almost 
any  statistics  text  for  several  choices  of  the  desired  level  of  sig- 
nificance (the  probability  of  rejecting  the  equality  of  means  if  true). 

3.  CHOICE  OF  LEVEL  OF  SIGNIFICANCE 

In  choosing  the  level  of  significance  for  both  the  mean  and 
variance  tests,  one  must  bear  in  mind  that  minimizing  the  probability 


49 


AFFDL-TR-76-83 


of  rejecting  the  equality  hypothesis  when  it  is,  in  fact,  true  in- 
creases the  probability  of  the  opposite  kind  of  error,  accepting  the 
equality  hypothesis  when  it  is,  in  fact,  false  (that  is,  when  the  two 
means  or  variances  are  not  equal).  If  they  are  not  equal  they  must 
differ  by  some  amount,  the  magnitude  of  which  strongly  affects  the 
power  of  the  test  which  is  the  probability  of  rejecting  the  equality 
hypothesis  when  it  is  false.  Clearly  when  two  statistics  are  not 
equal  it  is  much  easier  to  reject  an  equality  hypothesis  if  the  dif- 
ference is  large  rather  than  small.  In  such  circumstances  the  level 
of  significance  should  be  set  at  the  highest  acceptable  value  in  order 
to  maximize  the  power  of  detecting  a given  difference  or  minimizing 
the  difference  detected  with  a given  power. 
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SECTION  V 


STATISTICAL  MEASURES  OF  INTERDEPENDENCE  AMONG  VARIABLES 


Multiple  measurements  for  two  random  variables  can  be  charac- 
terized by  a measure  of  average  value  and  dispersion  for  each  vari- 
able plus  some  measure  of  the  interdependence  or  correlation  be- 
tween the  two  variables.  In  Figure  12a,  a bivariate  observation  is 
represented  by  the  coordinates  of  each  point  and  a subdivision  of 
both  coordinate  axes  is  represented  by  each  grid  square. 


a.  Scatter  Plot 


b.  Bivariate  Frequency 
Distribution  Function 


Figure  12.  Multiple  Observations  for  Two  Random  Variables 


A bivariate  histogram  analogous  to  Figure  la  could  be  constructed 
from  Figure  12a  by  erecting  above  each  grid  square  a bar  with  height 
equal  to  the  number  of  observations.  For  large  numbers  of  obser- 
vations and  more  refined  subdivisions,  such  bivariate  histograms 
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approximate  continuous  bivariate  mathematical  functions,  f(x,y), 
illustrated  by  the  surface  in  Figure  12b.  The  total  volume  under 
the  surface  f(x,y)  represents  the  total  number  of  observations  and 
the  volume  between  the  surface  and  any  region  in  the  XY  plane  rep- 
resents the  expected  number  of  observations  in  that  region.  If  the 
volume  under  the  surface  is  normalized  to  one  by  dividing  f(x,y)  by 
the  total  number  of  observations,  then  the  volume  between  the  surface 
and  any  region  in  the  xy  plane  represents  the  probability  that  a randomly 
selected  bivariate  observation  will  fall  in  that  region.  That  bivariate 
probability  density,  p(x,y),  the  value  of  the  ordinate  for  such  a 
normalized  surface,  is  defined  as  the  limit  of  the  ratio  of  the  probability 
associated  with  a given  region  to  the  area  of  the  region  as  each  of  its 
dimensions  approaches  zero.  Bivariate  probability  density  functions  are 
therefore  used  to  define  infinite  populations  from  which  any  given  set 
of  bivariate  measurements  constitutes  only  a sample. 

As  in  the  univariate  case,  statistics  are  numbers  selected  or 
computed  from  sample  data  to  define  various  characteristics  of  the 
sample  as  a whole.  Likewise,  the  mathematical  expression  for  every 
bivariate  probability  density  function  contains  one  or  more  constant 
parameters  that  define  characteristics  of  the  whole  population. 

Sample  statistics  are  therefore  used  to  estimate  unknown  population 
parameters  describing  the  same  general  characteristic.  For  each  of 
the  two  variables  in  a set  of  bivariate  data,  the  characteristics  of 
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chief  interest  are  the  same  measures  of  mean  value,  dispersion, 
skewness,  and  kurtosis  that  were  defined  for  the  univariate  case. 
One  additional  statistic  for  the  bivariate  case  defines  the  degree 
of  interdependence  or  correlation  between  the  two  variables. 


In  Figure  12a,  the  standard  deviations  of  the  x and  y coordi- 
nates are  respectively  the  horizontal  and  vertical  root  mean  square 
deviations  of  each  point  from  an  axis  system  with  its  origin  at  the 
mean  value  coordinates.  For  this  axis  system  a positive  correlation 
between  the  two  variables  is  indicated  by  a predominance  of  obser- 
vation points  in  the  first  and  third  quadrants,  while  a preponderance 
of  observation  points  in  the  second  and  fourth  quadrants  would  in- 
dicate a negative  correlation.  This  property  is  quantified  in  meas- 
ures of  correlation  defined  in  the  following  paragraphs. 
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1.  MEASURES  OF  BIVARIATE  CORRELATION 
a.  Coefficient  of  Correlation 

A measure  of  the  degree  of  relationship  or  association 
between  two  variables  is  the  coefficient  of  correlation  given  by 
summing  the  products  of  the  standardized  values  for  each  observation 
and  dividing  by  the  number  of  observations.  Expressed  mathematically 


Replacing  s and  a by  their  definitions  yields 

n 

X Uj-x)(yj-y) 


v Z (*i-xrv  £(yi-y) 

i=l  i=l 

*00*00 

J J l*“Mx)(y-/iy>  p(x,y)dydx 

p S <.flB  -flP  

CC  (x-^.x)*p(x,y)dydx  -J J J (y-/u.y)2p(x,y)dydx 

Multiplying  products  then  summing  and  dividing  by  n yields 


xy  - xy 


jf  /jiy  Ptx.yldydx-^/Xy 

/cc  x2p(x,y)dydx-^,$  \/f  J y2p(x,y)dydx-^t* 
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In  any  of  these  alternative  definitions  setting  y = x makes  r = 1 
and  p = I,  setting  y = -x  makes  r = -1  and  p = -1.  Thus  a corre- 
lation coefficient  of  one  (or  minus  one)  implies  perfect  positive 
(or  negative)  correlation;  that  is,  each  bivariate  measurement  is 
the  same  number  of  its  standard  deviation  units  away  from  its  mean 
value,  in  the  same  direction  for  positive  correlation  and  in  the 
opposite  direction  for  negative  correlation.  Perfectly  correlated 
variables  are  therefore  identical  except  for  possible  differences  in 
the  reference  point  and  scaling  unit,  as  for  example  temperature  in 
°F  or  °C.  Zero  correlation  implies  no  relation  between  the  two  var- 
iables since,  in  this  case,  the  sums  in  the  numerators  of  Equations 
22  and  22a  above  contain  offsetting  positive  and  negative  contri- 
butions. 


In  defining  the  standard  deviation  an  unbiased  estimate  for 
the  population  was  obtained  by  dividing  the  sum  of  the  squared  de- 
viations from  the  mean  by  n-1  rather  than  n whenever  that  mean  was 
computed  from  the  same  sample  data  as  the  standard  deviation.  A 
similar  consideration  exists  for  the  sample  correlation.  In  this  case 
an  unbiased  estimate  of  the  correlation  p in  a bivariate  population 
is  given  as 

P =/[( n - 1)  r2  - l]/(n  - 2)  (22c) 

p = correlation  estimate  for  a bivariate 
population 

r = correlation  computed  from  bivariate 
samples 

n = number  of  samples 


55 


AFFDL-TR-76-83 


b.  Autocorrelation  (Serial  Correlation) 

If,  in  the  defining  equations  for  correlation  (22,  22a 
and  22b),  the  observations,  x^,  are  measurements  taken  sequentially 
during  n equal  intervals  of  time.  At,  and  the  observations,  y^,  are 
values  not  of  a second  variable  but  of  the  first  variable  measured 
m intervals  of  time  later,  y . = x..  with  m « n,  then  the  coef- 
ficient computed  is  the  autocorrelation  for  time  lag  t = mAt.  If  we 
use  this  notation  the  Expressions  22a  and  22b  for  the  autocorrelation 
become 


ra 


M 

I 

i = l 

*itj) 

1 

IT 

i! 

: — i 1 

[x(t)-x] 

^2 

f~cof  d’ 
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x ( t ) x(f  +t)-x2 

X2-*2 
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(23a) 
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In  books  and  journals  on  time  series  statistics  these  ex- 
pressions are  called  normalized  autocorrelations,  with  the  auto- 
correlation itself  referring  only  to  the  first  term  of  the  numer- 
ator of  Equation  23a.  For  both  Equations  23  and  23a  the  full  nu- 
merator is  the  autocovariance  function  and  the  denominator  is,  of 
course,  simply  the  variance  of  the  sequence  of  measurements.  For 
some  applications  the  intervals  of  time  in  these  definitions  may  be 
replaced  by  intervals  of  distance  along  some  pathway  in  space. 


By  computing  autocorrelations  for  several  sequential  values 
of  m (and  thus  of  x since  x = mAt)  an  autocorrelation  function  is 
obtained.  Autocorrelation  functions  are  useful  in  identifying  per- 
iodicities in  sequential  statistical  data.  For  example,  hourlv  meas- 
urements of  temperature  have  a diurnal  cycle  normally  rising  from  a 
dawn  low  to  an  afternoon  high  and  then  falling  again.  Consequently 
tne  expected  autocorrelation  function  would  be  cyclical  with  maximum 
positive  correlation  at  multiples  of  mAt  = 24  hours  and  negative  cor- 
relations at  the  half  multiples  in  between.  For  very  long  records 
this  daily  cycle  would  be  superimposed  upon  a similar  annual  cycle 
from  winter  lows  to  summer  highs. 
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c.  Cross-Correlation 

If  in  the  defining  equations  for  correlation  (22,  22a,  22b), 
the  observations  x^  are  again  measurements  taken  sequentially  during 
n equal  intervals  of  time  At,  and  the  observations  are  values  of 
a second  variable  measured  m intervals  of  time  later  with  m <<  n, 
then  the  coefficient  computed  is  the  cross-correlation  for  time  lag 
x = mAt.  If  we  use  this  notation  the  Expressions  22a  and  22b  for 
cross-correlation  become 
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Note  that  cross-correlation  becomes  simple  correlation  for  in  = 0 or 
t = 0 and  it  becomes  autocorrelation  for  x(t  + t)  = y(t  + x). 
Accordingly,  some  statements  similar  to  those  for  autocorrelation 
can  be  made.  In  the  literature  about  time  series  statistics,  the 
above  expressions  are  called  normalized  cross-correlations  with  the 
cross-correlation  itself  referring  only  to  the  first  term  of  the 
numerator  of  Equation  24a.  For  both  Equations  24  and  24a  the  full 
numerator  is  the  cross-covariance  function  and  the  denominator  is, 
of  course,  simply  the  product  of  the  standard  deviations  for  the  two 
measurement  sequences.  For  some  applications  the  intervals  of  time 
in  these  definitions  may  be  replaced  by  intervals  of  distance  along 
some  pathway  in  space. 


By  computing  cross-correlations  for  several  sequential 
values  of  m (and  thus  of  x since  x = mAt)  a cross-correlation  func- 
tion is  obtained.  Cross-correlation  functions  are  useful  in  measuring 
response  times  to  some  prior  stimulation  or  excitation.  For  example, 
if  a system  disturbance  originates  at  point  A and  is  transmitted 
directly  to  point  B in  time  x^  and  indirectly  to  the  same  point  in 
time  X2,  then  the  cross-correlation  function  would  be  expected  to 
rise  to  a strong  relative  maximum  at  time  x-j,  and  a weaker  relative 
maximum  at  Conversely,  knowledge  of  such  response  times  from 
cross-correlation  maxima  can  be  useful  in  identifying  transmission 
paths  of  disturbances  in  complex  systems. 
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d.  Rank  Correlation 

If  in  the  defining  equations  for  correlation  (22,  22a, 

22b)  the  observations  x^y^  are  not  measurements  of  continuously 
variable  magnitudes  but  represent,  instead  the  rank  order  of  obser- 
vation i among  all  observations  for  the  same  variable,  then  the  coef- 
ficient computed  is  the  rank  correlation.  In  other  words,  if  obser- 
vations of  two  variables  are  both  independently  ranked  from  lowest 
to  highest  and  these  ranks  rather  than  any  measured  values  are  used 
for  the  x.j  and  y^  in  Equations  22  and  22a,  then  the  resulting  value 
is  the  coefficient  of  rank  correlation.  It  is  equivalent  to  the 
following  alternate  form  derived  in  Appendix  A 


r = 1 - 6 £d/'/n(n^  - I)  where  (25) 

d^  = rank  difference  for  observation  pair  i 
n = sample  size 

For  y = x,  the  ranks  of  the  two  variables  are  identical  for 
all  observations,  making  d ^ = 0 for  all  i and  r = 1.  For  y = -x  the 
two  variables  have  inverted  rank  orders  and  r = -1.  Thus  perfect 
positive  and  perfect  negative  correlation  are  indicated  respectively 
by  plus  one  and  minus  one  coefficients  of  rank  correlation.  Zero 
correlation  is  associated  with  random  rank  pairings  that  characterize 
two  unrelated  variables. 

Rank  correlation  is  most  appropriate  for  variables  which 
are  spoken  of  in  quantitative  terms  but  are  not  capable  of  objective 
measurement.  System  complexity,  for  example,  is  a composite  var- 
iable which  cannot  be  measured  in  any  single  unit  but  it  is  suf- 
ficiently well  understood  that  several  systems  can  generally  be  ranked 
in  order  of  their  complexity. 
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e.  Point  Biseriai  Correlation 

If  in  the  defining  equations  for  correlation  (22,  22a, 

22b)  one  of  the  variables,  say  x,  is  not  the  measurement  of  a con- 
tinuous magnitude  but  is  a discrete  variable  limited  to  either  zero 
or  one  values,  then  the  coefficient  computed  is  called  the  point 
biserial  correlation.  It  is  equivalent  to  the  following  alternate 
form  derived  in  Appendix  B. 

*i  - y0  / — - 

rp = vpo  pl  U6) 

yQ  = mean  of  y measurements  for  which  x = 0 

y-j  = mean  of  y measurements  for  which  x = 1 

Sy  = standard  deviation  of  all  y measurements 
pQ  = proportion  of  y measurements  for  which  x = 0 

p,  = proportion  of  y measurements  for  which  x = 1 

If  the  y means  of  the  two  groups  are  equal,  the  point  biserial  cor- 
relation is  zero,  whatever  the  values  of  pQ,  p^ , and  s^..  With  equal 
proportions  the  point  biserial  correlation  can  equal  one  only  if  the 
difference  in  y means  is  twice  s . 

y 

Point  biserial  correlation  is  appropriate  for  paired  obser- 
vations one  of  which  is  a continuous  measurement  and  the  other  is  a 
simple  dichotomous  measurement  that  classifies  each  observation  into 
one  of  two  categories.  The  classification  may  be  a quantitative  one 
in  which  the  one  or  zero  represents  the  presence  or  absence  of  a 
particular  attribute,  or  it  may  be  a purely  qualitative  one  in  which 
the  zero/one  assignment  is  made  arbitrarily. 
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f.  Tetrachoric  Correlation 

If  in  the  defining  equations  for  correlation  (22,  22a, 
22b)  both  the  x and  y observations  are  not  continuous  magnitudes 
but  discrete  variables  limited  to  either  zero  or  one  values,  then 
the  coefficient  computed  is  called  the  tetrachoric  correlation.* 

It  is  equivalent  to  the  following  alternate  form  derived  in  Appen- 
dix C. 


, _ ad-bc 

V(a  + c)  (b  + d)^(a  + b)  (c  + d) 


(27) 


a = number  of  observations  for  which  x = 0,  y = 0 
b = number  of  observations  for  which  x = 1,  y = 0 
c = number  of  observations  for  which  x = 0,  y = 1 
d = number  of  observations  for  which  x = 1,  y = 1 


Clearly,  if  b = c = 0 then  r = 1;  if  a = d = 0,  then  r = -1;  and  if 
ad  = be  then  r = 0. 


Tetrachoric  correlation  is  appropriate  for  paired  obser- 
vations each  of  which  is  a simple  dichotomous  classification  of  a 
single  observation  into  one  of  two  categories.  As  before,  the 
classification  may  be  a quantitative  one  in  which  the  one  or  zero 
represents  the  presence  or  absence  of  a particular  attribute,  or  it 
may  be  a purely  qualitative  one  in  which  the  zero/one  assignment  is 
made  arbitrarily. 


* - The  term  tetrachoric  correlation  is  sometimes  reserved  for  the 
case  in  which  both  x and  y are  continuous  variates  arbitrarily 
reduced  to  two  categories  above  and  below  some  selected  level. 
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The  square  of  the  tetrachoric  correlation  is  related  to 
chi-square  compu!"ed  from  the  same  data: 


2 2, 

r<p  = X /n 


where 


2 _ [a  - (a  + b)  (a  + c)/nr  , [b  - (b  + a)  (b  + d)/n]‘ 
- ^ r*  + ~r7+l)"(b  + 


(a  + b)  la  + c)/n 


(b 


d)/n 


(27a) 


[c  - (c  + a)  (c  + d)/n]^  [d  - (d  + b)  (d  + cj/n]^ 
+ Tc  + a)  (c  + Wrf  + (d  + b)  fd + c)/n 


The  numerator  of  each  of  the  four  terms  on  the  right  is  the  square 
of  the  difference  between  the  actual  number  of  observations  and  the 
expected  number  assuming  the  effect  of  changes  in  one  of  the  var- 
iables is  independent  of  the  value  of  the  other.  That  is  to  say, 
there  is  no  interdependence  or  correlation  between  them. 
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g.  Contingency  Tables 

From  the  defining  equations  for  correlation  (22,  22a,  22b) 
point  biserial  and  tetrachoric  correlations  were  obtained  by  assign- 
ing zero  and  one  values  to  the  variable  representing  the  dichotomous 
classification  of  the  statistical  observations.  If  there  are  more 
than  two  classes  in  either  or  both  variables  then  this  procedure  can- 
not be  used.  However,  for  multichotomous  classifications  of  both 
variables,  it  is  still  possible  to  compute  the  chi-square  quantity 
given  for  the  two  by  two  case  in  Equation  27a. 

Consider  for  example,  the  three  by  four  contingency  table 
given  below: 


a 

b 

c 

d 

(a+b+c+d) 

e 

f 

g 

h 

(e+f+g+h) 

i 

, j , 

k 

1 

(i+j+k+1 ) 

(a+e+i ) 

(b+f+j) 

(c+g+k) 

1d+h+l) 

The  actual  number  of  observations  in  the  row  one,  column  one  position 
is  a.  The  expected  number  assuming  no  interdependence  is  that  num- 
ber aQ  which  bears  the  same  ratio  to  the  total  number  of  observations 
in  column  one  that  the  total  number  of  observations  in  row  one  bears 
to  the  total  number  of  observations  in  the  table.  Expressed  mathem- 
atical ly 

ao  _ a + b + c + d 

a + e + i (a  + b + c + 7)  + (e  + f + g~+  h)+(i  + j + k + 1 ) 
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The  remaining  expected  values  are  computed  by  a corresponding  use 
of  other  marginal  totals.  Chi-square  for  this  case  is  then  given 

+ (b  ' bO)2  + (C  ~ CQ)2  + (d  - dQ)2 

bo  V do 

(f  - f )2  (g  - g )2  (h  - h )2 

' o'  , 3o'  . x 0 

+ 7 + + r 

f g h 

o so  o 

(J  - J0)z  (k  - k„)2  0 - 10)2 

The  number  of  degrees  of  freedom  for  two  way  contingency  tables  is 
the  product  of  the  number  of  rows  minus  one  and  the  number  of  columns 
minus  one.  For  the  example  this  is  (3  - 1 ) x (4  - 1 ) = 2 x 3 = 6. 

If  this  computed  y exceeds  the  tabulated  x for  the  same  number  of 
degrees  of  freedom  then  a hypothesis  of  no  relationship  or  inter- 
dependence between  the  two  classifications  can  be  rejected  with  a 

probability  of  error  not  greater  than  the  level  of  significance  for 
2 

the  x table  employed. 


oy 


2 (a  * ao}' 

X = 7 


, <e  - eo>' 
eo 


(i  - i r 

o' 
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h.  Correlation  of  Attributes 

Contingency  table  classifications  describing  character- 
istics of  objects,  systems,  or  processes  are  referred  to  as  attri- 
butes. The  degree  of  relationship,  association,  or  interdependence 
among  the  classifications  in  a k by  k contingency  table  is  called 
the  correlation  of  attributes,  r^,  given  by  the  expression 

r ^ = <v/x2/n(k  - 1)  (28) 

2 

where  x ancl  n are  as  previously  defined.  For  k = 2,  Equation  28 
defines  tetrachoric  correlation,  thus  accounting  for  the  identical 
tj)  subscript  found  in  Equation  27a.  Identical  row  distributions  in 
all  columns  and  identical  column  distributions  in  all  rows  implies 
zero  correlation.  All  observations  on  the  diagonal  implies  a perfect 
correlation  of  one.  Diagonals  do  not  exist  in  nonsquare  contingency 
tables, 
maximum 


so  an  alternate  measure  of  independence  having  less  than  unit 
value  is  defined  in  the  following  paragraph- 
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i.  Coefficient  of  Contingency 

A measure  of  the  degree  of  relationship,  association,  or 
interdependence  of  the  classifications  in  an  r by  c contingency  table 
is  the  coefficient  of  contingency  given  by 


f..  = number  of  observations  in  row  i 
J column  j of  the  contingency  table 

r.j  = number  of  observations  in  row  i 

c.  = number  of  observations  in  column  j 

J 

n=  E r.  = ec.=  I E f..=  total  number 

i j J i j of  observations 

2 

X = the  sum  of  ratios  as  described  in  previous 
paragraph  "g" 

The  equality  of  these  two  expressions  for  the  coefficient  of  contin- 
gency C is  established  in  Appendix  D.  If  equality  exists  in  each 
position  of  a contingency  table  between  the  actual  number  of  obser- 
vations and  the  expected  number  assuming  no  interdependence,  then 
2 

both  x and  C will  equal  zero.  The  larger  the  value  of  C,  the  greater 
is  the  degree  of  interdependence.  If  quantitative  bivariate  meas- 
urements are  subdivided  into  a large  number  of  interval  categories 
for  each  variate  and  each  observation  classified  into  the  resulting 
contingency  table,  then  the  coefficient  of  contingency  for  the  cat- 
egorized observations  approaches  the  coefficient  of  correlation  for 
the  quantitative  measurements.  The  number  of  rows  and  columns  in  a 
contingency  table  determines  the  maximum  value  of  C,  which  is  given 

by  / ( k - 1 ) / k for  the  case  in  which  there  are  k rows  and  k columns. 
For  this  special  case  only,  an  alternate  measure  of  interdependence 
having  a maximum  value  of  one  for  perfect  correlation  is  the  corre- 
lation of  attributes  given  in  paragraph  h. 
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2.  MEASURES  OF  MULTIVARIATE  CORRELATION 
a.  Multiple  Correlation 

A measure  of  the  degree  of  relationship  or  association 
between  one  variable  and  two  or  more  others  taken  together  is  the 
coefficient  of  multiple  correlation.  For  the  simplest  case  of  three 
variables  this  is  given  in  terms  of  the  bivariate  correlations  by 
the  expression 


2 

r0‘  1 2 


+ r02  ~ 2r01  r02  r12 
1 - rL 


9 = multiple  correlation  between  variable  0 and 

^ -LI 1 i o 


variables  1 and  2 


r^i  = correlation  between  variables  0 and  1 
r02  = correlat’’on  between  variables 0 and  2 
r^  = correlation  between  variables  1 and  2 


This  coefficient  of  multiple  correlation  is  equivalent  to  the  simple 
bivariate  correlation  between  the  measured  values  of  a dependent 
variable  x^  and  their  corresponding  estimates  computed  from  a lin- 
ear combination  of  the  independent  variables  (x^,  x9.j),  Xq.  = 

b^  + b^  x^  + Xp.,  where  the  coefficients  b^,  b^ , and  b9  are  cho- 

n 2 

sen  to  minimize  the  mean  square  error  L (xn.  - x'.)  /n.  The  pro- 

i = l U1  U1 

cedure  for  determining  the  b values  according  to  this  criterion  is 
given  by  regression  theory. 


Three  special  cases  of  Equation  30  are  of  interest:  r^  = 0, 
r02  ^or  r01 ^ = and  r01  = r02'  If  rl 2 = 0 then  r0 ‘ 1 2 = r01  + r02' 
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In  words,  if  there  is  no  correlation  between  the  independent  vari- 
ables the  square  of  the  multiple  correlation  between  the  dependent 
variable  and  both  independent  variables  is  equal  to  the  sum  of  the 
squares  of  the  simple  bivariate  correlations  between  the  dependent 
variable  and  each  independent  variable. 

2 2 2 

If  = 0 then  r^..^  = rQ-|/0  ' ri2^’  *n  case  as 

2 2 2 
r- 12  increases  from  0 to  1 , rg.-]2  increases  from  r^^  to  1.  This  is 

surprising.  Since  there  is  no  correlation  at  all  between  variables 
zero  and  two,  it  might  be  expected  that  the  simple  correlation  between 
variables  zero  and  one  would  equal  the  multiple  correlation  between 
variable  zero  and  both  one  and  two.  However,  this  is  true  only  if 
there  is  also  no  correlation  between  variables  one  and  two.  If  they 
are  correlated  then  variations  in  variable  two  will  produce  vari- 
ations in  variable  one  but  not  in  variable  zero  (since  r ^ = 0). 

Thus  with  variable  two  accounting  for  some  of  the  variations  in 
variable  one  the  remaining  variation  is  more  closely  associated  with 
variations  in  variable  zero  than  the  simple  bivariate  correlation 
between  them  would  indicate.  Therefore,  increasing  positive  or  neg- 
ative correlation  between  two  independent  variables  one  of  which  has 
no  correlation  with  the  dependent  variable  produces  an  increasing 
multiple  correlation  between  the  dependent  and  independent  variables. 

Turning  now  to  the  third  special  case  of  Equation  30,  if 
2 2 

r01  = r02  = r0*  t*ien  ro - 1 2 = 2r0/(l  + r^).  *n  this  instance  as  r^ 

2 2 2 

increases  from  0 to  1 , r^  ^ decreases  from  2Tq  to  r^;  but  as  r^ 

2 2 

decreases  from  0 to  -1,  r^.^  increases  from  2r^  to  +1.  The  drop 


f 

i 

r» 
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in  r 2 ^ ^ as  increasingly  positive  results  from  the  fact  that 

both  variables  one  and  two  are  progressively  accounting  for  more  and 
more  of  the  same  variations  in  variable  zero,  and  one  of  them  is 
therefore  becoming  more  and  more  redundant.  The  rise  in  rg.^  as 
r-| 2 becomes  more  negative  results  from  the  fact  that  variables  one 
and  two  are  progressively  accounting  for  more  and  more  of  the  oppo- 
site variations  in  variable  zero,  and  both  of  them  are  therefore 
becoming  more  and  more  critical.  Therefore,  decreasingly  positive 
or  increasingly  negative  correlation  between  two  independent  vari- 
ables, both  of  which  are  positively  (or  both  negatively)  correlated 
with  the  dependent  variable,  produces  an  increasing  multiple  corre- 
lation between  the  dependent  and  independent  variables.  Also,  in- 
creasingly positive  correlation  between  two  independent  variables, 
one  of  which  is  positively  and  the  other  negatively  correlated  with 
the  dependent  variable,  produces  an  increasing  multiple  correlation 
between  the  dependent  and  independent  variables. 

Multiple  correlations  greater  than  one  will  be  obtained  if 
certain  arbitrary  combinations  of  simple  bivariate  correlation  co- 
efficients are  used  in  Equation  30  and  its  special  cases.  This  will 
not  occur  in  practice,  however,  since  intercorrelations  among  sets 
of  variables  cannot  be  chosen  arbitrarily  subject  only  to  the  con- 
dition that  they  are  equal  to  or  less  than  one  in  absolute  value. 

Two  variables,  for  example,  which  are  perfectly  correlated  with  a 
third  variable  must  of  necessity  be  perfectly  correlated  with  each 
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other.  Given  correlations  among  three  variables  ^ and  r^ 
it  can  be  shown  that  the  limits  for  correlation  r^  will  always  be 

rl 2 = r01  r02  - V1"  r01  " r02  + r01  r02  ^31 

Extension  of  multiple  correlation  to  more  than  three  var- 
iables is  quite  simple  if  matrix  notation  is  employed.  One  need 
only  define  the  correlation  matrix  R to  be  the  array  of  all  the 
simple  bivariate  correlations  among  the  complete  set  of  variables. 


1 

rl  2 

rl  3 

rln 

rl  2 

1 

r23 

r2n 

R = 

rl  3 

r23 

1 

r3n 

r2n  r3n 


r.  . = element  in  row  i col  j and  row  j col  i of  R 
' J 

r i J = element  ip  row  i col  j and  row  j col  i of  R 

where  R is  the  inverse  of  matrix  R defined 

so  that  RR’1  = R-1R  =1,1  being  the  identity 
matrix  having  ones  in  the  principal  diagonal 
from  upper  left  to  lower  right  with  zeros  in 
all  other  positions. 


The  multiple  correlation  between  variable  i and  all  others  is  then 
given  by 


i • 1 ,2, . . . i-1  ,i+l , . . .n  " J r 
= 1 for  a correlation  matrix. 


rii r 


1 - -U-  (34) 

r 


since  r- 
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In  defining  the  coefficient  of  correlation  an  expression. 
Equation  22c,  was  given  for  obtaining  an  unbiased  estimate  of  corre- 
lation in  a bivariate  population  from  the  value  computed  from  sample 
measurements.  Similarly  Equation  35  below  gives  an  unbiased  esti- 
mate of  the  population  multiple  correlation  p between  variable  i 
and  all  other  variables  from  the  value  r computed  from  sample  meas- 
urements 


pi -1,2 1-1 .1+1 » 


.m 


'i  - /"-=1Y 

(l  . r2 

) 

V 1-1,2,. 

. .i-1 , i+1 ,. 

•• m ) 

(35) 


p = multiple  correlation  estimate  for  population 
r = multiple  correlation  computed  from  samples 
m = number  of  variables 
n = number  of  multivariate  observations 


Clearly  with  m = 2 this  reduces  to  the  previous  Equation  22c. 

b.  Marginal  Correlation 

The  multiple  correlation  between  one  variable  and  some 
of  the  remaining  variables  with  the  rest  of  the  remaining  variables 
ignored  is  called  a marginal  correlation.  The  simple  bivariate  corre- 
lation is  a marginal  correlation  with  all  but  two  variables  ignored. 

c.  Conditional  or  Partial  Correlation 

If  two  variables  are  both  correlated  with  a third  variable, 
then  observations  resulting  solely  from  variations  in  this  third 
variable  will  introduce  a spurious  correlation  between  the  first  two 
variables.  A measure  of  the  correlation  between  two  variables  that 
is  independent  of  variations  in  other  correlated  variables  is  called 
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conditional  or  partial  correlation.  For  the  simplest  case  of  three 
variables  this  is  given  in  terms  of  the  simple  bivariate  correlations 

by  the  expression 


12 


' rl 3 r23 


1*213 


To  - 


/I 

r13J 


23 


(36) 


r , = conditional  correlation  between  variables  1 

I'?!  3 and  2 for  fixed  values  of  variable  3 

r = correlation  between  variables  1 and  2 

y.  = correlation  between  variables  1 and  3 

r = correlation  between  variables  2 and  3 

Multiple  conditional  correlations  may  also  be  defined.  To 
do  this,  first  partition  the  correlation  matrix  to  separate  the  condi 
tioned  and  conditioning  variables  as  follows: 


rll 

rlm 

rl,m+l 

rln 

• 

r . . , , 

r 

R11 

R12 

rml  

rmm 

1 

mn 

rm+l,l 

rm+l ,m 

rm+l ,m+l 

rm+l  ,n 

R21 

rn,l 

. r 

n,m 

rn,m+l 

rnn 

where  all  diagonal  elements  r..  equal  one  and  all  r^  - rjr 


(37) 
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Then  compute  the  matrix  T given  by 


T " Rn  ' R12  R22  R21 


(38) 


The  conditional  correlation  between  variables  i and  j of  the  first 

m variables  for  fixed  values  of  the  last  (n-m)  variables  is  then 

qiven  from  elements  t..  of  matrix  T by 
3 I J 

t , , = t. Jt~  (39) 

i\l|m  + l,...n  ij  v 1 1 v JJ 

Multiple  conditional  correlations  are  then  given  by  using  the  matrix 
of  these  conditional  correlations  rather  than  R in  Equation  34: 

(40) 


t. 


i -1 ,2,. . .i-1 , i+1 . .m|m+l 


- /l  1 

n / t i • i | m+1 , . . . n 


since  t • i i ~~  1 for  any  correlation  matrix, 

i • i m+1 , . . . n 
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d.  Canonical  Correlation 


Correlations  between  linear  combinations  of  two  sets  of 
variables  subject  to  certain  restrictions  on  the  coefficients  pre- 
scribing these  linear  combinations  are  called  canonical  correlations. 
Specifically  suppose  there  is  a p variate  population  . . . xp  and 

a q variate  population  y^y^  . ..  y with  p ^ q for  definiteness. 

Then  p linear  combinations  of  the  x's.  u.j , u^,  ...  up  and  q linear 
combinations  of  the  y's,  v.| , v^,  ...  vp  can  be  found  all  with  zero 

mean  and  unit  variance  and  with  covariance  (u.,u.)  = 0,  covariance 

1 j 

(v.j , Vj ) = 0,  and  covariance  (u^.v^)  = 0 for  all  i f j.  The  corre- 
lation between  u^  and  v^  is  then  a canonical  correlation,  at  most  q 
of  which  are  non-zero.  If  u and  v are  the  linear  combinations  cor- 
responding to  the  largest  canonical  correlation,  then  v is  the  linear 
combination  of  the  y's  which  can  be  predicted  from  the  x’s  with  the 
least  residual  variance,  and  u is  che  appropriate  linear  prediction 
function. 


To  obtain  canonical  correlations  and  their  associated  co- 
efficients first  partition  the  simple  bivariate  correlation  matrix 
to  identify  the  correlations  within  and  between  the  two  sets  of 
variables 


R = 


xx 


y* 


Then  formulate  the  eigenvalue  problem 


| R R-1 
1 yx  xx 


xy 


X R 


yy 


= o 


(41) 


and  solve  for  the  eigenvalues  Xj,...,X  and  the  eigenvectors  , ...,  6 
The  eigenvectors  normalized  so  that  Ryy  8^  = 1 are  the  canonical 
coefficients  for  the  standardized  y variables.  The  canonical  corre- 
lations Yj  and  the  coefficients  a of  the  standardized  x variables 
are  given  by 


R*y  6( 


(41a) 
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e.  Autocorrelation 

In  the  previous  section  on  bivariate  measures,  auto- 
correlation was  defined  in  terms  of  a single  sequence  of  time  data 
points.  Here  autocorrelation  is  defined  in  terms  of  an  ensemble  of 
such  time  history  data.  In  this  multiple  record  case  each  obser- 
vation time  is  a separate  variable  with  an  observed  measurement  from 
each  record.  With  autocorrelation  now  defined  as  the  first  term  in 
the  numerator  of  Equation  23a,  Figure  13  shows  both  the  time  corre- 
lation for  each  individual  record  and  the  ensemble  correlation  for 
each  pair  of  observation  times.  For  a stationary  random  process 
using  the  notation  of  Figure  13 


(y(ti))  = (y(tj))  <V^i>)  = (y2(tj}) 

{yO^)  y(tj ) y = <(y(t)  y(t  + -t)^)  -r  = t^  - t. 


(42) 


For  an  ergodic  random  process  it  is  also  true  that 

<^y(t)^>  = y i <V(t)^>  = y-  i = 1 ....n 

<(y(t)  y(t  + t)^>  = y(t)  y(t  + 7) 


Stationarity  thus  implies  that  the  mean  and  mean  square 
ensemble  averages  are  independent  of  time  and  that  the  autocorrelation 
depends  only  on  the  time  difference,  not  on  the  particular  starting 
or  ending  times.  Ergodicity  implies  stationarity  and  the  equality  of 
the  ensemble  average  with  the  time  average  of  each  record  for  the  mean, 
mean  square  and  autocorrelation  values.  The  converse  of  these  state- 
ments  is  not  strictly  true  since  stationarity  and  ergodicity  involve 
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all  statistics,  not  just  the  mean,  mean  square,  and  autocorrelation. 
For  practical  purposes,  however.  Equations  42  and  43  are  generally 
considered  both  necessary  and  sufficient  conditions  for  stationarity 
and  ergodicity,  respectively. 

For  an  ergodic  random  process  the  time  averaged  statistics 
from  any  one  record  are  equivalent  both  to  the  same  time  averaged 
statistics  for  any  other  record  and  to  the  same  ensemble  averaged 
statistics  for  any  time  or  set  of  times.  Therefore,  an  analysis  of 
a single  record  will  suffice  for  the  entire  ensemble  if  the  random 
process  is  ergodic.  This  is  of  course  the  reason  for  the  practical 
importance  of  ergodic  processes  in  time  series  statistics. 


For  many  applications,  time  series  data  from  ergodic  processes 
are  best  treated  in  their  reciprocal  time  or  frequency  domain.  Spec- 
tral values  of  this  kind  result  from  the  Fourier  transform  of  the 
random  time  history  function.  Multiplying  this  transform  by  its  own 
complex  conjugate  produces  the  real  valued  autospectral  (or  power 
spectral)  density  function.  The  autospectral  density  function  also 
results  directly  from  the  Fourier  transform  of  the  autocorrelation 
function  (a  real  valued  transform  since  the  autocorrelation  function 
is  symmetric  with  respect  to  positive  and  negative  values  of  the  time 
interval  t in  Equations  23  and  23a).  The  mathematical  theory  and 
computational  details  for  carrying  out  Fourier  transformation  of 
time  data  will  not  be  treated  here  since  a wide  literature  exists  on 
this  subject.  For  present  purposes  it  is  sufficient  to  note  that 
spectral  values  may  be  used  in  place  of  time  data  for  the  various 

statistical  measures  and  analysis  techniques  described  herein.  As 

E 

in  the  bivariate  case  the  time  data  may  be  replaced  by  space  data 
in  which  measurements  are  made  at  equal  intervals  along  a line  in 
space. 
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f.  Cross  Correlation 

In  the  previous  section  on  bivariate  measures,  cross- 
correlation was  defined  in  terms  of  two  different  sequences  of 
time  data  points.  If  each  sequence  represents  measurements  from  two 
different  ergodic  random  processes,  then  the  relationships  between 
the  two  records  also  apply  to  the  two  ensembles.  In  particular  the 
cross-correlation  function  now  defined  by  the  first  term  in  the  nu- 
merator of  Equation  24a  is 

Rxy  (O  = \ J *(*)  yU  + T)  dt  (44) 

For  the  special  case  x = y this  defines  the  autocorrelation  function. 

As  noted  in  the  previous  section  for  many  applications  time 
data  is  converted  to  the  frequency  domain  by  means  of  Fourier  trans- 
forms. The  Fourier  transform  of  x(t)  multiplied  by  the  complex  con- 
jugate of  the  Fourier  transform  of  y(t  + x)  gives  the  cross  spectral 
density  function  G^f).  The  cross-spectral  density  function  also 
results  directly  from  the  Fourier  transform  of  the  cross-correlation 
function  given  by  Equation  44. 


For  more  than  two  ensembles  spectral  density  functions  be- 
tween every  pair  may  be  computed  and  arranged  in  a matrix  with  auto- 
spectra . in  the  diagonal  positions  and  cross-spectra  G^.  in  the 
off  diagonal  positions.  Corresponding  cross-spectral  elements  on 
opposite  sides  of  the  diagonal  and  G^  will  be  complex  conjugates 
of  one  another.  The  coherence  function  >2  (f)  between  record  i and 
j is  then  given  by  1J 


2 G..(f)  G ..(f) 

*’(f)  = Gii  (f ) G j j ( f ) 1 1 


(45) 


Upon  arranging  the  into  matrices,  various  multiple,  marginal, 
conditional,  and  canonical  coherences  may  be  defined  in  ways  anal- 
ogous to  correlations  of  the  same  kind. 
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SECTION  VI 

FACTOR  ANALYSIS  OF  MULTIPLE  VARIABLES 


The  concept  underlying  factor  analysis  is  best  illustrated  by 
an  example.  Consider  the  following  correlation  matrix  among  eight 
variables. 


VARIABLE 


NUMBER 

1 

2 

3 

1 

1 .0000 

.2208 

.0624 

2 

.2208 

1 .0000 

.3080 

3 

.0624 

.3080 

1 .0000 

4 

.8088 

.2972 

.3448 

5 

.0888 

.2540 

.8912 

6 

.7800 

.3040 

.1416 

7 

.2952 

.8520 

.5176 

8 

.2208 

.1356 

.8024 

First  reorder  the  row  . 

and  column 

Old  Number  1 

2 3 

4 5 i 

New  Number  6 

4 1 

8 3 

4 

5 

6 

7 

8088 

.0888 

.7800 

.2952 

2972 

.2540 

.3040 

.8520 

3448 

.8912 

.1416 

.5176 

0000 

.3504 

.8948 

.4380 

,3504 

1 .0000 

.1608 

.4472 

.8948 

.1608 

1.0000 

.4000 

.4380 

.4472 

.4000 

1 . 0000 

.4772 

.7460 

.2872 

.3216 

numbers  as  follows: 

7 8 

5 2 
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.2208 

.1356 

.8024 

.4772 

.7460 

.2872 

.3216 

.0000 
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Then  rewrite  the  above  eight  by  eight  matrix  as  follows: 


VARIABLE 


NUMBER 

3 

8 

5 

2 

7 

1 

6 

4 

3 

1 .0000 

.8024 

.8912 

.3080 

.5176 

.0624 

.1416 

.3448 

8 

.8024 

1.0000 

.7460 

.1356 

.3216 

.2208 

.2872 

.4772 

5 

.8912 

.7460 

1.0000 

.2540 

.4472 

.0888 

.1608 

.3504 

2 

.3080 

.1356 

.2540 

1 . 0000 

.8520 

.2208 

.3040 

.2972 

7 

.5176 

.3216 

.4472 

.8520  1 

.0000 

.2952 

.4000 

.4380 

1 

.0624 

.2208 

.0888 

.2208 

.2952 

1 . 0000 

.7800 

.8080 

6 

.1416 

.2872 

.1608 

.3040 

.4000 

.7800  1 

.0000 

.8948 

4 

.3448 

.4772 

.3504 

.2972 

.4380 

.8080 

.8948  1 

.0000 

In  this  matrix  the  correlations  in  the  three  blocks  along  the  diagonal 
are  all  very  high  while  those  in  the  six  off  diagonal  blocks  are  very 
low.  Thus  the  variables  fall  into  three  groups  (3,  8,  5),  (2,  7),  and 
(1,  6,  4)  characterized  by  high  within  group  correlations  and  low  be- 
tween group  correlations.  Each  group  therefore  represents  a factor 
that  is  measured  rather  well  by  any  variable  within  the  group  and  very 
poorly  by  any  variable  outside  the  group.  Of  course,  not  all  corre- 
lation matrices  can  be  reordered  with  such  a clear  distinction  between 
correlations  in  the  diagonal  and  off  diagonal  blocks,  but  this  would 

only  represent  cases  in  which  some  of  the  variables  are  strongly 

■ 

affected  by  two  or  more  of  the  factors. 

f 

■ 


Continuing  with  this  example,  one  can  verify  that  all  of  the  off 
diagonal  correlations  in  the  reordered  correlation  matrix  can  be  ob- 
tained exactly  by  the  following  product  of  a matrix  and  its  transpose 


Ifc*. 
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96 

.24 

.04 

.96 

.82 

CO 

CO 

.10 

.30 

.00 

.06 

CO 

C\J 

82 

O 

ro 

.26 

.24 

.02 

.18 

V£> 

CO 

.92 

.12 

.20 

.16 

88 

.18 

CO 

o 

.04 

.26 

CO 

o 

.14 

.22 

.84 

.90 

.94 

.10  .86  .14 

.30  .92  .22 

.00  .12  .84 

.06  .16  .90 

.28  .20  .94 


These  are  the  coefficients  in  an  equation  representing  each  variable 

V.  as  a linear  combination  of  the  three  more  fundamental  factors  F : 
J P 


V3  = 

.96 

F1 

+ 

.24 

F2 

+ .04 

F3 

V2  = 

.10  F1 

+ 

00 

F2  + .14 

F3 

V8  = 

.82 

F1 

+ 

.02 

F2 

+ .26 

F3 

V7  = 

.30  F] 

+ 

.92 

F2  + .22 

F3 

V5  = 

.88 

F1 

+ 

.18 

F2 

+ .08 

F3 

V1  = ' 

.00 

F1 

+ 

.12  F2 

+ .84 

F3 

V6=  ' 

.06 

F1 

+ 

.16  F2 

+ .90 

F3 

V4=  ' 

.28 

F1 

+ 

.20  F2 

+ .94 

F3 

Note  that  the  first  group  of  variables  (3,  8,  5)  is  most  heavily  loaded 
on  the  first  factor  F^,  the  second  group  (2,  7)  on  the  second  factor 
F and  the  third  group  (1,  6,  4)  on  the  third  factor  F^.  Clearly 
the  sum  of  the  squares  of  the  coefficients  in  each  of  these  equations  is 
one  of  the  diagonal  terms  in  the  above  matrix  product.  This  quantity 
is  called  the  communal ity  and  it  represents  the  square  of  the  corre- 
lation between  each  variable  and  its  common  factor  representation  as 


j 
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' ^ 


given  above.  The  difference  between  the  communal ity  and  one  represents 
the  effect  of  a unique  factor  associated  with  each  variable. 

The  matrix  product  shown  above  is  not  unique  --  a correlation 
matrix  R can  be  decomposed  into  many  such  products.  In  factor  anal- 
ysis the  one  having  maximum  variance  among  the  elements  is  chosen 
since  this  one  also  maximizes  the  number  of  elements  having  very  low 
and  very  high  absolute  values.  This,  in  turn,  simplifies  the  inter- 
pretation of  each  factor  by  representing  each  of  them  in  terms  of  the 
smallest  number  of  variables. 
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1.  FACTOR  ANALYSIS  MODEL 

The  object  of  factor  analysis  is  to  define  a large  number  of  inter- 
related variables  in  terms  of  a much  smaller  number  of  more  independent 
factors.  The  simplest  mathematical  model  for  describing  a variable  in 
terms  of  several  others  is  the  linear  representation.  For  such  a linear 
composite  to  be  valid,  however,  all  variable  and  factor  measurements 
must  be  referenced  to  the  same  origin  and  scaled  in  the  same  units.  To 
do  this,  one  first  subtracts  from  each  observation  x.  its  mean  value  x 
and  then  divides  the  resultant  quantity  by  its  standard  deviation  s^, 
a measure  of  the  dispersion  or  scatter  in  a set  of  observations . Thus 
transformed,  the  new  standardized  value  expresses  the  deviation  from  the 
mean  in  standard  deviation  units.  Expressed  mathematically; 


z.ii  * (*ji  • iJ)/sx,1 


i = 1....N 
j = 1 n 


where 


1 

Xj  ‘ N 


S.i!s  ,S,  <\u  - x/ 


The  classical  factor  analysis  model  may  be  written  for  the  standard- 
ized value  of  the  jth  variable  and  the  ith  observation  as  follows: 


z..  = Z a.  F . + d.U.. 
J1  n_,  JP  PI  J J1 


j = 1 . . .n 
i = 1...N 
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In  this  expression  F . is  the  standardized  value  of  the  common  factor  F 

for  observation  i,  each  of  the  m terms  a.  F . represents  the  contri- 

J P P ■ 

bution  of  the  corresponding  factor  to  the  linear  composite,  and  the 
d-U.,  is  the  residual,  specific,  or  unique  contribution  in  the  assumed 

J J ' 

representation  of  the  observed  measurement  Zj..  In  the  geometric 

representation  of  this  model,  the  unique  factors  are  assumed  to  be 

mutually  orthogonal  and  orthogonal  to  the  common  factors  which  are  not 

necessarily  assumed  mutually  orthogonal.  Note  that  the  representation 

is  not  unique  since  the  total  number  of  factors  F , U . exceeds  the 

P J 

number  of  variables,  Zj. 

The  complete  set  of  N values  for  each  of  the  n variables  can  be 
represented  by  the  n x N matrix  as  follows: 


Similarly,  the  common  and  unique  factors  may  be  represented  as 


The  coefficients  of  these  factors  in  Equations  47  may  be  represented  by 
the  n by  m and  n by  n matrices  as  follows: 
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With  these  definitions.  Equation  47  may  be  written  in  matrix  form 

Z = AF  + DU  (48) 

The  matrix  of  observed  correlations  among  the  variables  can  be  defined 
in  matrix  notation  by 

R = ZZ'/N  V - 1 transpose  (49) 


If  Equation  48,  the  factor  analysis  model  for  the  matrix  Z,  is 
substituted  into  this  expression,  we  have 
R = (AF  + DU)  (AF  + DU) 1 /N 
R = A(FF7N)  A'  + A(FU7N)D‘ 

+ D(UF7N)  A'  + D(UU7N)D' 

The  first  and  last  quantities  in  parentheses  both  having  the  same  form 
as  Equation  49  are  correlation  matrices.  The  correlation  matrix  of  the 
common  factors  is  denoted  by  <f>  = FF7N.  The  correlation  matrix  of  the 
unique  factors  is  an  identity  matrix  since  the  unique  factors  are 
assumed  to  be  uncorrelated,  i.e.,  represented  by  mutually  orthogonal 
axes.  The  remaining  two  terms  in  parentheses  are  both  null  matrices 
since  the  common  and  unique  factors  are  assumed  to  be  uncorrelated, 
i.e.,  mutually  orthogonal.  Thus,  we  have 

R = A(FF7N)A'  + D(UU7N)D  - A <J>  A‘  + DD1  (50) 

If  the  common  factors  are  also  assumed  to  be  uncorrelated  or  orthogonal 
R = AA'  + DD'  (51) 

Clearly,  the  correlation  matrix  derived  from  the  common  factors  only  is 
given  by 


R*  = AA'  = 


air-- 

■ ' 'alm 

i 

QJ 

•••am 

c 

(V 

1 

. . . a 

nm  _ 

_alm‘ ' " 

■ ' 'anm 

(52) 
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This  matrix  (Equation  52)  is  the  same  as  the  former  (Equation  51)  in  the 
off-diagonal  elements,  but  the  diagonal  elements,  designated  communal i ties, 
are  numbers  less  than  one.  In  terms  of  the  matrix  elements,  they  are 
given  by 


V = Z aT 


j = 1 


These  communalities  are  the  squares  of  the  correlations  r between 

ZjZl 

the  total  factor  and  the  common  factor  representations  of  each  of  the 

variables  as  shown  by  the  following: 

Given:  z-.  = a.,F,.  + ...+  a.  F . + d-U 

Ji  Jl  It  jm  mi  j ji 

z'..  = a . .F, . +....+  a-  F . 
ji  ji  li  jm  mi 


r i = £ z..z'. . / / E z..  E z 

VV  i = l J'  J'  , M J 1 i-l 


Vi  ■hW(,,(hVhi 


The  off-diagonal  elements  of  Equation  50  or  51  are,  of  course,  the 
c dinary  coefficients  of  correlation  given  in  terms  of  the  matrix 
elements  for  the  variables  j arid  k by 


” = E a . a, 

zjzk  p=l  JP  kP 


2.  NUMBER  OF  COMMON  FACTORS 

From  matrix  theory,  it  is  known  that  the  rank  of  AA'  cannot  exceed 


the  rank  of  A which  in  turn  cannot  exceed  its  smaller  dimension,  in  this 


r 
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case  the  number  of  columns  m.  Consequently,  although  the  reproduced 
correlation  matrix  R*  = AA'  has  order  n equal  to  the  number  of  variables, 
its  rank  cannot  exceed  m,  the  number  of  common  factors.  Since  the 
number  of  common  factors  cannot  be  less  than  the  rank  cf  the  reproduced 
correlation  matrix,  the  minimum  number  of  common  factors  must  equal  the 
minimum  possible  rank  of  the  reproduced  correlation  matrix.  Since  the 
correlation  matrix  reproduced  from  the  common  factors  differs  from  that 
reproduced  from  all  the  factors  only  in  the  diagonal  elements,  one  of 
the  major  problems  of  factor  analysis  is  to  determine  by  how  much  the 
rank  of  a correlation  matrix  can  be  reduced  from  n by  a suitable  choice 
of  communal ities  in  the  diagonal.  The  computation  of  such  minimal  rank 
communal ities  is  so  formidable  even  on  modern  computers  that  it  is  not 
normally  attempted.  Instead,  they  are  approximated  by  the  squared 
multiple  correlations  given  by  one  minus  the  reciprocals  of  the  corres- 
ponding elements  in  the  diagonal  of  the  inverse  of  the  correlation 
matrix.  The  squared  multiple  correlations  are  known  to  be  lower  bounds 
for  true  minimal  rank  communal ities  and  approach  the  latter  as  the  ratio 
of  the  number  of  factors  to  the  number  of  variables  approaches  zero. 

3.  FACTOR  SOLUTION 


The  solution  for  the  a coefficients  or  loadings  in  the  factor 
analysis  model.  Equation  47,  is  an  eigenvalue  problem  analogous  to  the 
one  encountered  in  determining  normal  modes  of  vibration  or  print  i;al 
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axes  of  rotation  in  dynamics  problems.  The  matrix  equation  in  this 
case  is 


Here  the  r's  are  correlation  coefficients  between  the  variables,  the  h's 

are  the  communalities  or  rank  minimizing  values  of  the  previous  section, 

Xp  is  one  of  the  eigenvalues,  and  the  column  of  a's  is  the  associated 

eigenvector,  the  elements  of  which  serve  as  i.ne  coefficients  or  loadings 

n 2 

of  the  pth  factor  for  the  n variables  when  the  Ap  = Z a^  condition 
is  fulfilled.  Some  of  the  eigenvalues  will  be  zero  since  selecting  the 
diagonals  to  minimize  rank  is  equivalent  to  minimizing  the  number  of 
non-zero  eigenvalues,  or  equivalently  the  number  of  common  factors  as 
desired.  When  the  squared  multiple  correlation  is  ustu  to  approximate 
the  true  rank  minimizing  communalities  in  the  diagonal,  the  exact  positive 
semi-definite  character  of  the  matrix  is  destroyed  and  the  zero  eigenvalues 
are  replaced  with  small  positive  and  negative  numbers  which  are  simply 
ignored.  In  practice,  only  those  factors  associated  with  the  few  highest 
eigenvalues  are  needed  in  the  factor  analysis  model  (Equation  47)  since 
the  correlation  matrix  reproduced  from  these  alone  often  yields  a very 
close  approximation  to  the  observed  correlation  matrix  in  the  off- 
diagonal  elements  which  are  the  elements  of  consequence. 

4.  FACTOR  ROTATION 

The  form  employed  in  deriving  the  factor  coefficients  or  loadings 
ajp  has  the  property  that  the  sum  of  the  contributions  of  the  successive 

89 


factors  makes  the  total  communal ity  a maximum  under  the  conditions 
relating  these  coefficients  to  the  off-diagonal  correlations.  As  noted 
previously,  no  factor  solution  is  unique  and  other  factor  loadings  not 
having  this  property  would  yield  identical  correlation  matrices. 

Since  factors  are  hypothetical  constructs,  their  interpretation 
must  be  in  terms  of  the  observable  variables.  The  simplest  possible 
illustration  of  a clear  cut  factorization  occurs  when  a sequence  of 
variables  can  be  found  in  which  the  higher  correlations  occur  in  blocks 
along  the  principal  diagonal  of  the  correlation  matrix  and  the  lower 
correlations  occur  in  all  other  positions.  In  terms  of  the  factor 
analysis  model  (Equation  47),  this  corresponds  to  a number  of  factors 
equal  to  the  number  of  blocks,  and  each  variable  having  a substantially 
higher  squared  loading  coefficient  on  one  factor  than  on  any  of  those 
remaining,  each  variable  then  becoming  an  imperfect  measure  of  one 
factor  only.  In  slightly  more  complex  illustrations,  maximum  squared 
loadings  will  occur  on  several  factors  with  minimum  loadings  on  all 
those  remaining.  Ideally  then,  for  the  simplest  physically  meaningful 
interpretation  of  the  hypothetical  factors  in  terms  of  observed  vari- 
ables, the  squared  loading  coefficients  should  approach  their  upper  and 
lower  bounds,  one  and  zero  respectively.  This  implies  the  maximum 
possible  variance  in  the  squared  loading  coefficients.  Clearly,  there 
must  be  some  orientation  of  the  orthogonal  factor  axes  for  which  the 
squared  loading  coefficients  have  greater  variance  than  for  any  other. 
Mathematically,  this  requires  rotations  to  maximize  the  following 
variance  function,  the  new  factor  loadings  now  denoted  by  b's. 
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m n h4 

12  2 Djp 

n p=l  j=l  hj 


1 


m 

E 

p=l 


(56) 


The  h's  are  introduced  so  that  in  axis  rotations  each  coefficient  is  weighted 
equally  rather  than  in  proportion  to  its  communality  which  would  other- 
wise be  the  case.  The  actual  rotations  required  to  maximize  this  function 
constitute  a sequential  iteration  process.  The  resulting  b.  coefficients 

J r 

are  called  the  varimax  loadings. 


llJWJ  jWI.M  nipui ■ .yi 
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where 

y^  = observation  i in  group  j 

a = general  value  common  to  all  measurements 

b-  = an  additional  contribution  characteristic  of  group  j 

c = coefficient  of  covariate  x 

x • • = covariate  value  for  observation  i in  group  j 

J * 

e--  = error  term  for  observation  i in  group  j 
3 * 

m = number  of  groups 

n.  = number  of  observations  in  group  j 
<1 

For  the  specific  case  of  three  groups  with  two  measurements  each,  these 
equations  may  be  written  out  as  follows  using  the  fact  that 

3 

bo  =-b-,  - b9  from  E b.  = 0 
^ j = l J 


yll 

= a 

4- 

1 

bl 

+ 

0 

b2 

+ 

Cxll 

+ 

ell 

y12 

= a 

+ 

1 

bl 

+ 

0 

b2 

+ 

cx12 

+ 

el  2 

y21 

= a 

+ 

0 

bl 

+ 

1 

b2 

+ 

CX21 

+ 

e21 

y22 

= a 

+ 

0 

bl 

+ 

1 

b2 

+ 

CX22 

+ 

e22 

y31 

= a 

- 

1 

bl 

- 

1 

b2 

+ 

Cx31 

+ 

e31 

y32 

= a 

- 

1 

bl 

- 

1 

b2 

+ 

CX32 

+ 

e32 

(58) 


In  matrix  notation  this  set  becomes 


yll 

= 

1 1 0 x„“ 

a 

+ 

ell 

yl  2 

1 1 0 x12 

bl 

el  2 

y21 

1 0 1 x21 

b2 

e21 

y22 

1 0 1 x22 

c 

e22 

y31 

1 -1  -1  x31 

e31 

y32 

1 -1  -1  x32 

e32 

(59) 
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If  one  denotes  the  entire  six  by  one  column  matrix  on  the  left  by  Y, 
the  entire  six  by  four  matrix  on  the  right  by  X,  the  entire  four  by  one 
column  matrix  by  B,  and  the  entire  last  six  by  one  column  matrix  by  e. 


this  becomes 


Y = XB  + e 


Clearly  this  matrix  Equation  (60)  remains  valid  for  any  number  of 
groups  and  any  number  of  observations  per  group  since  only  the  dimen- 
sions of  the  matrix  change.  For  the  same  reason  additional  covariates 
may  be  added  without  affecting  the  validity  of  matrix  Equation  60. 


b.  Two-Way  Classification  of  Variables 

For  a two-way  classification  of  the  random  variable  y and 

covariate  x into  m.  rows  and  m,  columns  with  n..  measurements  in  cell 
J K J K 

jk  the  analysis  of  covariance  model  is 

»Jk1  = a + bj  + ck  * djk  * f*jki  * ejki 


III.  III. 

ej  b.  = o,  r 


m.  wk 

0,  EJ  d = 0,  E d..  = 0 
j = l JK  k=l  JK 


ejki  = °’  j k=1”‘mk 


= observation  i in  row  j column  k 
= general  value  common  to  all  measurements 
= main  effect  for  row  j 
= main  effect  for  column  k 
= interaction  effect,  row  j column  k 
= coefficient  of  covariate  x 
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covariate  for  observation  i in  row  j column  k 
error  for  observation  i in  row  j column  k 


m 

m 

n 


j 

k 

jk 


number  of  rows 
number  of  columns 

number  of  observation  in  row  j column  k 


For  the  specific  case  of  three  rows  and  three  columns  with  two  meas- 
urements each  these  equations  may  be  written  as  follows  using  the 
facts  that 


II 

CO 

-O 

- 

bl 

- 

b2’ 

C3 

= " 

C1 

"C2 

9 

d3k 

— 

-dlk 

-d 

2k’  and 

d .. 
J3 

— 

-d.. 

Ji 

- 

V 

ylll 

— 

a 

+ 

lbl 

+ 

0b2 

+ 

lcl 

+ 

0c2 

+ 

ldll 

+ 

0d12 

+ 

0d21 

+ 

0d22 

+ 

fxm 

+ 

elll 

y112 

= 

a 

+ 

lbl 

+ 

0b2 

+ 

lcl 

+ 

0c2 

+ 

ldll 

+ 

0d12 

+ 

0d21 

+ 

°d22 

+ 

fx112 

+ 

e112 

y121 

» 

a 

+ 

lbl 

+ 

0b2 

+ 

0cl 

+ 

lc2 

+ 

0dll 

+ 

ld12 

+ 

0d21 

+ 

0d22 

+ 

fx121 

+ 

el21 

y122 

= 

a 

+ 

lbl 

+ 

0b2 

+ 

0cl 

+ 

lc2 

+ 

0dll 

+ 

ld12 

+ 

0d21 

+ 

0d22 

+ 

fx122 

+ 

e122 

y131 

= 

a 

+ 

lbl 

+ 

0b2 

- 

lcx 

- 

lc2 

- 

ldll 

- 

ld12 

+ 

0d21 

+ 

0d22 

+ 

fx131 

+ 

e131 

y132 

= 

a 

+ 

lbl 

+ 

ob2 

- 

lcl 

- 

lc2 

- 

ldll 

- 

ld12 

+ 

0d21 

+ 

0d22 

+ 

fv 

132 

+ 

e132 

y211 

= 

a 

+ 

0bl 

+ 

lb2 

+ 

lcl 

+ 

0c2 

+ 

0dll 

+ 

0d12 

+ 

ld21 

+ 

0d22 

+ 

fy 

211 

+ 

®211 

y212 

= 

a 

+ 

0bl 

+ 

lb2 

+ 

lcl 

+ 

0c2 

+ 

0dll 

+ 

0d12 

+ 

ld21 

+ 

0d22 

+ 

fy 

212 

+ 

e212 

y221 

* 

a 

+ 

0bl 

+ 

lb2 

+ 

0cl 

+ 

lc2 

+ 

0dll 

+ 

0d12 

+ 

0d21 

+ 

ld22 

+ 

fy 

x221 

+ 

e221 

y222 

= 

a 

+ 

0bl 

+ 

lb2 

+ 

0cl 

+- 

lc2 

+ 

0dll 

+ 

0d12 

+ 

0d21 

+ 

ld22 

+ 

f X 

222 

+ 

e222 

y231 

= 

a 

+ 

0b1 

+ 

lb2 

- 

lcl 

lc2 

+ 

0dll 

+ 

0d12 

- 

ld21 

- 

ld22 

+ 

fy 

231 

+ 

e231 

y232 

« 

a 

+ 

0bl 

+ 

lb2 

- 

lcl 

- 

lc2 

+ 

0dll 

+ 

0d12 

- 

ld21 

- 

ld22 

+ 

fx232 

+ 

e232 

y311 

- 

a 

- 

lbl 

- 

lb2 

+ 

lcl 

+ 

0c2 

- 

ldll 

+ 

0d12 

- 

ld21 

+ 

0d22 

+ 

f y 

311 

+ 

e311 

y312 

= 

a 

- 

lbl 

- 

lb2 

+ 

lcl 

+ 

0c2 

- 

ldll 

+ 

0d12 

- 

ld21 

+ 

0d22 

+ 

fy 

312 

+ 

e312 

y321 

» 

a 

- 

lbl 

- 

lb2 

+ 

0cl 

+ 

lc2 

+ 

0dll 

- 

ld12 

+ 

0d21 

- 

ld22 

+ 

f y 

rx321 

+ 

e 32 1 

y322 

S 

a 

- 

lbl 

- 

lb2 

+ 

0cl 

+ 

lc2 

+ 

0dll 

- 

ld12 

+ 

0d21 

- 

ld22 

+ 

f y 

322 

+ 

e322 

y331 

= 

a 

- 

lbl 

- 

lb2 

- 

lcl 

- 

lc2 

+ 

ldll 

+ 

ld12 

+ 

ld21 

+ 

ld22 

+ 

f X 

331 

+ 

e331 

y332 

s 

a 

- 

lbl 

- 

lb2 

- 

lcl 

- 

lc2 

+ 

ldll 

+ 

ld12 

+ 

ld21 

+ 

ld22 

+ 

f x 

x332 

+ 

e332 

95 
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If  one  denotes  the  entire  eighteen  by  one  column  matrix  on  the  left  by 
Y,  the  entire  eighteen  by  ten  matrix  on  the  right  by  X,  the  entire  ten 
by  one  column  matrix  by  B,  and  the  last  eighteen  by  one  column  matrix  by 


e,  this  becomes 


Y = XB  + e 


This  is  the  same  matrix  equation  previously  obtained,  differing  only 
in  having  higher  dimensions  from  the  addition  of  a second  classifi- 
cation of  variables.  Clearly  the  same  would  apply  for  three  or  more 
classifications  of  variables.  As  before  the  number  of  groups  per 
classification,  the  number  of  measurements  per  group,  and  the  number 
of  covariates  also  affect  only  the  dimensions  of  matrix  Equation  60 
which  remains  valid.  The  elements  of  matrix  B are  called  regression 
coefficients. 


2.  COMPUTING  THE  MATRIX  OF  REGRESSION  COEFFICIENTS 

So  far  nothing  has  been  said  about  the  means  of  obtaining  the 
values  in  the  matrix  B.  Since  the  mathematical  model  contains  an 
error  term  for  each  measurement,  the  elements  of  B could  have  any 
values  whatsoever  and  the  error  term  could  then  be  adjusted  to  make 
the  equality  true.  The  unstated  assumption,  of  course,  has  been  that 
the  errors  are  to  be  minimized  in  some  fashion  to  give  a best  fit  of 
the  function  to  the  data.  Minimizing  the  simple  sum  of  the  errors 
would  be  inappropriate  since  large  positive  and  negative  errors 
would  offset  each  other  and  appear  as  little  or  no  error.  Instead 
the  sum  of  the  squares  of  the  errors  is  minimized  to  give  the  well 
known  least-squares  fit. 


3P  : 
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One  of  the  equalities  given  by  the  matrix  Equation  60  can  be 
written  as 


*1  = £ XJ1  bo  + e- 


in  which  the  single  subscript  i replaces  the  set  of  subscripts  used 
above  to  designate  the  measurement  and  the  single  subscript  j re- 
places the  set  of  subscripts  used  to  distinguish  the  various  group 
and  interaction  effects  as  well  as  the  coefficients  of  the  covari- 
ates. The  error  sum  of  squares  is  therefore  given  by 


ii  hi  r 

5E  = Z (yi  " E Xii  bi}‘ 
L i=l  1 j=l  J1  J 


To  find  the  values  of  the  b.  for  which  this  is  a minimum  the  deriv- 

J 

atives  with  respect  to  the  b^  are  set  equal  to  zero  in  accordance 
with  the  usual  procedure  in  calculus 


dt)  = z ^ xiibi>  xki 

aDk  i=l  1 j=l  J K1 


= 0 k=l . . . .m 


E E x.  .x  b.  = E xk.yi 
i=l  j=l  K1  J1  J i=l  K 1 


This  last  equation  may  be  written  in  matrix  form 


X'  X B = X'  Y 


k=l . . . .m 


These  are  called  the  normal  equations.  They  may  be  solved  to  obtain 
the  minimizing  values  of  the  regression  coefficients  B: 

B - (X1  X)  'V  Y (66] 
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3.  SIGNIFICANCE  TESTS  FOR  REGRESSION  COEFFICIENTS 

These  elements  of  B obtained  from  sample  data  are  only  esti- 
mates of  the  true  population  parameters.  Zero  values  for  any  sub- 
set of  these  parameters  would  indicate  no  contribution  to  the  use- 
fulness of  the  prediction  model  (Equation  60).  Consequently  some  stat- 
istical test  is  needed  to  determine  if  the  computed  values  for  any 
subset  are  sufficiently  close  to  zero  to  warrant  such  an  inference 
for  the  population  parameters.  If  this  be  so  the  mathematical  pre- 
diction model  may  be  simplified  by  deleting  this  subset  of  variables. 

a.  Sums  of  Squares 

To  perform  statistical  significance  tests  all  m of  the 

bj  elements  in  matrix  B are  computed  from  Equation  66  and  the  error 

sum  of  squares  SSE  is  computed  from  Equation  65.  Then  with  some 

k (k<m)  of  the  b,  set  equal  to  zero  all  m-k  of  the  remaining  b. 

J J 

elements  in  matrix  B are  recomputed  along  with  a new  and  larger  total 

error  sum  of  squares,  SST.  This  new  total  error  is  larger  because 

the  assumption  that  k of  the  b.  equal  zero  has  changed  them  from  the 

3 

minimizing  values  previously  computed.  The  difference  between  this 
larger  total  error  sum  of  squares  (computed  from  new  minimizing  values 
for  those  b.  not  assumed  to  be  zero)  and  the  original  error  sum  of 

J 

squares  (computed  from  minimizing  values  for  all  the  b^)  is  called 
the  hypothesis  sum  of  squares,  SSH.  This  relationship  can  be  sym- 
bolized by  the  right  triangle  shown  in  Figure  14. 


Figure  14.  Error,  Total  and  Hypothesis  Sums 

of  Squares  1* 
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The  relation  of  these  sums  of  squares  to  other  such  sums  is  given 
in  Appendix  E.  There,  Figure  14  corresponds  to  the  triangle  TRS. 

b.  Variance  Estimates 

The  number  of  degrees  of  freedom  associated  with  SSE  is 

the  sample  size  n minus  the  number  m of  the  b^  computed.  Dividing 

j 

SSE  by  n-m  gives  an  estimate  of  the  variance  of  the  data  about  the 

regression  function  containing  all  m of  the  b..  Likewise  the  number 

of  degrees  of  freedom  associated  with  SST  is  the  sample  size  n minus 

the  number  m-k  of  the  b^  computed,  and  dividing  SST  by  n-(m-k)  gives 

an  estimate  of  the  variance  of  the  data  about  the  regression  function 

containing  m-k  of  the  b..  This  information  is  summarized  in  an  anal- 

J 


of  variance 

table  as  follows. 

Analysis  of 

Covariance 

Source 

Sum  of  Squares 

df 

Mean  Square 

Hypothesis 

SSH 

k 

SSH/k 

Error 

SSE 

n-m 

SSE/ (n-m) 

Total 

SST 

n-m+k 

If  zero-valued  population  parameters  actually  do  corres- 
pond to  k of  the  b,  elements  set  equal  to  zero,  then  both  SSE/(n-m) 

J 

and  SST/(n-m+k)  estimate  the  same  variance.  In  this  case  the  hypoth- 
esis sum  of  squares  SSH  arises  solely  from  sampling  variation  and 
dividing  SSH  by  its  k degrees  of  freedom  yields  another  estimate  of 
the  population  variance.  As  suggested  by  the  perpendicular  lines  in 
Figure  14,  SSH/k  and  SSE/(n-m)  are  independent  estimates  of  variance. 
As  previously  noted  the  ratio  of  two  independent  estimates  is  asso- 
icated  with  the  F statistic. 
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c.  Variance  Ratio  Test 

This  F ratio  can  be  used  to  test  statistically  the  hypoth- 
esis of  zero  values  for  the  k population  parameters  corresponding 

to  b.  elements  set  equal  to  zero  in  SSH.  The  testing  procedure  is 
J 

as  follows. 

(1)  Choose  the  acceptable  level  of  risk  - the  probability 
of  rejecting  the  hypothesis  when  it  is  in  fact  true. 

(2)  From  a statistical  F table  for  that  level  of  risk 
select  the  tabulated  F value  for  k degrees  of  freedom  in  the  numer- 
ator and  m-n  degrees  of  freedom  in  the  denominator. 

(3)  Obtain  the  computed  F ratio  " rom 

F _ SSH/k 

r SSE/(n-m) 

(4)  If  this  computed  F exceeds  the  tabulated  F reject  the 

hypothesis  of  zero  values  for  all  k of  the  population  parameters 

corresponding  to  the  b.  elements  set  equal  to  zero  in  SSH. 

J 

(5)  If  the  hypothesis  of  zero  values  for  the  k population 
parameters  is  rejected,  the  values  obtained  in  B of  Equation  66  can 
be  taken  as  the  best  estimates  with  existing  data. 
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4.  TRANSFORMED  GENERAL  LINEAR  HYPOTHESIS  MODELS 

If  each  measurement  and  associated  covariate  is  replaced  by  its 

logarithm.  Equation  64  becomes 

m 

log  y.  = Z b,  log  x,,  + e,  log  10 
1 j=l  3 3 

1,1  bi  ei  (CT\ 

log  y.  = Z log  x J + log  10  (671 

1 j=l  Ji 


m b . 

log  y,  = log  n x J 
1 j=l  ji 


where  eC 


e 

10 


i 


This  product  results  when  input  observations  are  subjected  to  a log- 
arithmic transformation  before  using  a general  linear  hypothesis  pro- 
cedure. It  is  the  appropriate  model  to  use  for  statistical  prediction 
functions  if  the  error  terms  are  multiplicative,  with  1 playing  the 
same  role  that  zero  does  in  the  additive  case.  Note  from  Equation  67 
that  e'.  = 1 when  e.  = 0 (from  e:  = 10  )■  This  implies  a log 

normal  distribution  function  meaning  that  the  logarithms  of  the  errors 
are  normally  distributed. 
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SECTION  VIII 
CONCLUSIONS 

Given  a set  of  observations  on  a set  of  random  variables  an 
orderly  data  processing  procedure  would  be  precisely  the  sequence 
of  operations  given  in  the  preceding  sections  of  this  report. 

a.  Compute  univariate  statistics  giving  some  measure  of  aver- 
age value,  dispersion,  skewness,  and  kurtosis  for  each  of  the  random 
variables. 

b.  Select  the  probability  density  function  associated  with 
each  variable  by  matching  the  computed  univariate  statistics  with 
those  tabulated  for  specific  mathematical  functions. 

c.  Conduct  statistical  t and  F tests  to  determine  if  signif- 
icant differences  exist  in  the  means  and  variances  of  random  vari- 
ables measured  at  different  locations  or  under  different  test  con- 
ditions. 

d.  Compute  the  bivariate  correlations  between  each  pair  of 
random  variables,  arrange  them  in  a correlation  matrix,  and  then  com- 
pute the  multiple,  marginal,  conditional,  and  canonical  correlations 
of  particular  interest.  Perform  a factor  analysis  on  the  correlation 
matrix  to  determine  the  structure  of  interrelations  among  the  vari- 
ables and  how  each  variable  may  be  expressed  in  terms  of  a smaller 
number  of  underlying  factors. 


e.  Formulate  mathematical  models  quantifying  the  precise  re- 
lationship between  any  dependent  variable  and  a particular  set  of  in- 
dependent variables  or  factors. 
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APPENDIX  A 

COEFFICIENT  OF  RANK  CORRELATION 


Given  n bivariate  observations  let  represent  respectively 
the  rank  of  the  ith  observation  of  x among  all  x's  and  the  ith  obser- 
vation of  y among  all  y's.  Then  we  have 


ZXi  = 

^i 

= n(n  + 

D/2 

Zx2  - 

= n(n  + 

1 )(2n  + 

l)/6 

-yi)2= 

Zx2 

+ ^i' 

2 Zxi 

yi 

i ^i 

[Zx2 

+ ' 

- E (xi 

"*i>: 

i ^i 

[2n(n 

+ l)(2n  + 

1)  - 6 

Z(x. 

= (2  xi  y^/n)  - (Z  x./n)(Z  y^n) 


(Zx2/n)  - ( Zx./nr  v(  Ey^/n)  - ( Zy^n)' 


if  we  let  = xi 


{[2n(n  + 1 ) (2n  + 1 ) - 6 d?]/12n}  - [n(n+l)/2n]2 


[n(n  + l)(2n  + l)/6n]  - [n(n  + l)/2n] 

[(4n3  + 6n2  + 2n  - 6Zd2)/12n]  - [ ( n2  + 2n  + 1 )/4] 
[2n3  + 3n2  + n)/6n]  - [(n2  + 2n  + l)/4] 

4n3  + 6n2  + 2n  - 6Z  d2  - 3n3  - 6n2  - 3n 


4n  + 6n  + 2n 


This  is  the  coefficient  of  rank  correlation. 
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APPENDIX  B 

POINT  BISERIAL  CORRELATION 


Given  n bivariate  observations  xi , y^ , the  first  of  which  is  either 
one  or  zero  according  as  to  whether  a given  attribute  is  or  is  not 
present  when  the  y measurement  is  taken.  To  indicate  these  two  condi- 
tions zero  and  one  subscripts  are  used  with  the  measurement  y and  the 
sample  size  n in  the  following  derivation 


Ux.  y/nr-  (zx1/n)(zyi/n) 

/ (ExJ/n)  - (Ix./n)^  /(Zy./n)  - (Sy/n)1 


(Sy^/n)  - (n-| ) (Ly0i  + Ey^/rr 
J (n1/n)  - ( n i / n ) *'  J y2  - y 2 

(n-j/n)  (y] ) - (nQ  ^/n  )yQ  - (n]/n‘:)y1 
/ (n-j/n)  (1  - n^/n)  / y2  - yZ 
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APPENDIX  C 

TETRACHORIC  CORRELATION 

Given  n bivariate  observations  each  of  which  is  either  one 

or  zero  according  to  whether  each  of  a pair  of  attributes  is  or  is  not 
present.  The  sample  may  be  subdivided  as  follows 

a = number  of  observations  for  which  x = 0,  y = 0 

b = number  of  observations  for  which  x = 1,  y = 0 

c = number  of  observations  for  which  x = 0,  y = 1 

d = number  of  observations  for  which  x = 1,  y = 1 

Then  n=a+b+c+d 

Lxi  = L'x2  = b + d,  Zy.  = Ey^  = c + d Ex-  y^  = d 


(ad  + bd  + cd  + d2)  - (be  + bd  + cd  + d2) 


\/  [ a+  b + c + d)(b  + d)  -(b  + d )^J  (a  + b + c + d)(c  + d)  - (c  + d ) 2 


ad  + bd  + cd  + d2  - be  - bd  - cd  - d2 


V [(b+d)  + (a+c)]  (b+d)  - (b+d)2  J [(a+b)  + (c+d)]  (c+d)  - (c+d)‘ 


r = (ad  - be)/ J (a  + c)(b  + d) J\a  + b)(c  + d) 


This  is  the  tetrachoric  coefficient  of  correlation. 
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APPENDIX  D 

COEFFICIENT  OF  CONTINGENCY 


If  we  let  f . • be  the  actual  number  of  observations  in  row  i and 
• J 

column  j of  a contingency  table  and  fQ. . be  the  expected  number  of 
observations  in  the  same  position  assuming  no  interdependence,  then  the 
quantity  chi  square  (x2)  is  given  by  the  sum 

\2 


.e,  t} 

X 1 J f Oi j 


f1J  * f0iJ  - 2 fij  f0il 


f0ij 


X = Ei 


f2. 

TJ  + f - 2 f . . 
f0tj 


fl 


= Z-  Z-  + n - 2n 
i J Tn,-  .■ 


Oi  j 


2 v _ fiJ 

X = 1*  li  - n 

1 J T0  j 


However  f ..  = r.c./n  where  r.  is  the  total  number  of  observations  in 
Oi  J i J 

row  i and  c.  is  the  total  number  of  observations  in  column  j.  Therefore 
J 


2 - V V f i J 

x Ei  Lj  r^j/n 


- n = z.  z. 


nf2 
Li. 


1 J ricj 


- n 


X2- 


f2. 

JUL  . i 


z.  z. 

1 J ricj 
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Using  this  expression  for  x^  in  the  definition  of  the  coefficients  of 
contingency  gives  the  following  result: 


These  are  the  alternate  definitions  of  the  coefficient  of  contingency 
as  given  in  Equation  29. 
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APPENDIX  E 

SUMS  OF  SQUARES  IN  ANALYSIS  OF  VARIANCE 

In  mathematical  models  for  statistical  data,  measurements  are 
resolved  into  two  components:  explained  by  regression  and  unex- 

plained error.  The  relationship  among  the  sums  of  squares  of  these 
components  are  indicated  in  the  figure  below.  Projections  first  onto 
the  plane  and  then  onto  the  line  symbolize  regression  functions  with 
a limited  and  then  further  reduced  number  of  variables. 


Vector  OT  - represents  the  measurement  sum  of  squares  SSM.  Each  meas- 
urement is  squared  and  then  all  are  summed. 

Vector  OR  - the  projection  of  OT  on  the  plane,  represents  the  regression 
sum  of  squares  SSR.  Each  regression  estimate  is  squared  and  then  all 
are  summed. 

Vector  OS  - the  projection  of  both  OR  and  OT  on  the  line,  represents 
the  hypothesis  regression  sum  of  squares  for  a reduced  number  of 
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variables  SSRH.  Each  such  reduced  regression  estimate  is  squared  and 
then  all  are  summed. 


Vector  SR  - the  difference  between  OR  and  OS,  represents  the  hypothesis 
sum  of  squares  SSH.  The  difference  between  each  regression  and  re- 
duced regression  estimate  is  squared  and  then  all  are  summed.  In  a 
simple  one-way  analysis  of  variance  this  is  the  among-groups  sum  of 
squares . 

Vector  RT  - the  difference  between  OT  and  OR,  represents  the  error  sum 
of  squares  SSE.  The  difference  between  each  measurement  and  its  re- 
gression estimate  is  squared  and  then  all  are  summed.  In  a simple 
one-way  analysis  of  variance  this  is  the  within-groups  sum  of  squares. 


Vector  ST  - the  difference  between  OT  and  OS,  and  also  the  sum  of  SR 
and  RT,  represents  the  total  sum  of  squares,  SST.  The  difference 
between  each  measurement  and  its  reduced  regression  estimate  is 
squared  and  then  all  are  summed.  Alternatively  SST=SSH+SSE,  this  re- 
lation being  the  reason  for  the  name  "total  surn  of  squares". 


As  a numerical  illustration  of  these  sums  of  squares  consider 
the  following  one-way  analysis  of  variance  of  four  groups  of  six  ob- 
servations each.  In  each  group  the  observed  measurement  on  the  right 
of  the  equal  sign  is  shown  as  the  sum  of  a common  value  plus  a group 
effect  plus  an  error  term. 


Group  1 

80+60+2  = 142 
80+60+2  = 142 
80+60+1  = 141 
80+60-1  = 139 
80+60-1  = 139 
80+60-3  = 137 


Group  2 

80+0+11  = 91 
80+0+  9 = 89 
80+0+  2 = 82 
80+0-  2 = 78 
80+0-  9 = 71 
80+0-11  - 69 


Group  3 

80-20+14  = 74 
80-20+12  = 72 
80-20+  1 = 61 
80-20-  1 = 59 
80-20-13  = 47 
80-20-13  = 47 


Group  4 

80-40+16  = 56 
80-40+14  = 54 
80-40+  1 = 41 
80-40-  1 = 39 
80-40-15  = 25 
80-40-15  = 25 
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Note  that  the  error  values  sum  to  zero  within  groups  and  the  group 
effects  sum  to  zero  across  groups.  The  squares  of  these  numbers  are 
tabulated  below: 

6400  3600  4 20164  6400  0 121  8281  6400  400  196  5476  6400  1600  256  3136 

6400  3600  4 20164  6400  0 81  7921  6400  400  144  5184  6400  1600  196  2916 

6400  3600  1 19881  6400  0 4 6724  6400  400  1 3721  6400  1600  1 1681 

6400  3500  1 19321  6400  0 4 6084  6400  400  1 3481  6400  1600  1 1521 

6400  3600  1 19321  6400  0 81  5041  6400  400  169  2209  6400  1600  225  625 

6400  3600  9 18769  6400  0 121  4761  6400  400  169  2209  6400  1600  225  625 

Summing  corresponding  columns  for  each  group  gives 

24  (6400)  + 6(3600  + 0 + 400  + 1600)  + 2016  = 189,216 
153,600  + 33,600  + 2016  = 189,216 
SSH  = 33,600  Among-groups  or  hypothesis  sum  of  squares 
SSE  = 2,016  Within-groups  or  error  sum  of  squares 
SST  = 33,600  + 2,016  Total  sum  of  squares 
SSRH  = 153,600  = Reduced  regression  sum  of  squares 
SSR  = 153,600  + 33,600  = Regression  sum  of  squares 
SSM  = 153,600  + 33,600  + 2016  = Measurement  sum  of  squares 


All  sums  of  squares  are  thus  obtainable  from  simple  summations 
for  this  special  case  in  which  SSH  is  simply  the  sum  of  the  squared 
group  effects.  However,  in  more  general  cases  (unequal  group  sample 
sizes,  for  example),  the  value  of  the  common  term  changes  when  group 
effects  are  assumed  to  be  zero  and  the  effect  of  this  difference  must 
also  be  a part  of  SSH. 
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